How can end users know the cause of LR slot sync delays?
Hi,
We have seen cases where slot synchronization gets delayed, for example
when the slot is behind the failover standby or vice versa, and the slot
sync worker has to wait for one to catch up with the other. During this
waiting period, users querying pg_replication_slots can only see whether
the slot has been synchronized or not. If it has already synchronized,
that’s fine, but if synchronization is taking longer, users would naturally
want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization
delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not
something end users would want to check regularly. Moreover, since logging
is configuration-based, relevant messages may sometimes be skipped or
suppressed.
Thanks & Regards,
Ashutosh Sharma.
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?
--
With Regards,
Amit Kapila.
On Thu, 28 Aug 2025 at 14:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?
How about something like pg_stat_progress_replication_slot with remote
LSN/standby LSN/catalog XID etc?
Wouldn't this be in sync with all other debug pg_stat_progress* views
and thus more Postgres-y?
--
Best regards,
Kirill Reshke
On Thu, Aug 28, 2025 at 3:29 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 28 Aug 2025 at 14:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?How about something like pg_stat_progress_replication_slot with remote
LSN/standby LSN/catalog XID etc?
Wouldn't this be in sync with all other debug pg_stat_progress* views
and thus more Postgres-y?
Yes, that is another option. I am a little worried that it is not
always the sync lags behind, so having a separate view just for sync
progress may be too much. Yet another option is existing view
pg_stat_replication_slots but it seems sync progress doesn't directly
match there. For example, we can add a counter sync_skipped, time of
last sync_skip, and last_sync_skip_reason that could be sufficient to
dig the problem further.
--
With Regards,
Amit Kapila.
Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:We have seen cases where slot synchronization gets delayed, for example
when the slot is behind the failover standby or vice versa, and the slot
sync worker has to wait for one to catch up with the other. During this
waiting period, users querying pg_replication_slots can only see whether
the slot has been synchronized or not. If it has already synchronized,
that’s fine, but if synchronization is taking longer, users would naturally
want to understand the reason for the delay.Is there a way for end users to know the cause of slot synchronization
delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are
not something end users would want to check regularly. Moreover, since
logging is configuration-based, relevant messages may sometimes be skipped
or suppressed.Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?
I have similar thoughts, but for clarity, I’d like to outline some of the
key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was
skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */
RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local
reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of remote,
risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a
consistent snapshot */
} ReplicationSlotSyncSkipReason;
--
Step 2: Introduce new column to pg_replication_slots to store the skip
reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;
--
Step 3: Function to convert enum to human-readable string that can be
stored in pg_replication_slots.
/*
* Convert ReplicationSlotSyncSkipReason bitmask to human-readable string.
*
* Returns a palloc'd string; caller is responsible for freeing it.
*/
static char *
replication_slot_sync_skip_reason_str(ReplicationSlotSyncSkipReason reason)
{
StringInfoData buf;
initStringInfo(&buf);
if (reason == RS_SYNC_SKIP_NONE)
{
appendStringInfoString(&buf, "none");
return buf.data;
}
if (reason & RS_SYNC_SKIP_REMOTE_BEHIND)
appendStringInfoString(&buf, "remote_behind|");
if (reason & RS_SYNC_SKIP_DATA_LOSS)
appendStringInfoString(&buf, "data_loss|");
if (reason & RS_SYNC_SKIP_NO_SNAPSHOT)
appendStringInfoString(&buf, "no_snapshot|");
/* Remove trailing '|' */
if (buf.len > 0 && buf.data[buf.len - 1] == '|')
buf.data[buf.len - 1] = '\0';
return buf.data;
}
--
Step 4: Capture slot_sync_skip_reason whenever the relevant LOG messages
are generated, primarily inside update_local_synced_slot or
update_and_persist_local_synced_slot. This value will can later be
persisted in the pg_replication_slots catalog.
--
Please let me know if you have any objections. I’ll share the wip patch in
a few days.
--
With Regards,
Ashutosh Sharma.
Hi Ashutosh,
I am also interested in this thread. And was working on a patch for it.
On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?I have similar thoughts, but for clarity, I’d like to outline some of the key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of remote, risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a consistent snapshot */
} ReplicationSlotSyncSkipReason;--
I think we should also add the case when "remote_slot->confirmed_lsn >
latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot
is still not flushed on the Standby). In this case as well we are
skipping the slot sync.
Step 2: Introduce new column to pg_replication_slots to store the skip reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;--
As per the discussion [1], I think it is more of stat related data and
we should add it in the pg_stat_replication_slots view. Also we can
add columns for 'slot sync skip count' and 'last slot sync skip'.
Thoughts?
Step 3: Function to convert enum to human-readable string that can be stored in pg_replication_slots.
/*
* Convert ReplicationSlotSyncSkipReason bitmask to human-readable string.
*
* Returns a palloc'd string; caller is responsible for freeing it.
*/
static char *
replication_slot_sync_skip_reason_str(ReplicationSlotSyncSkipReason reason)
{
StringInfoData buf;
initStringInfo(&buf);if (reason == RS_SYNC_SKIP_NONE)
{
appendStringInfoString(&buf, "none");
return buf.data;
}if (reason & RS_SYNC_SKIP_REMOTE_BEHIND)
appendStringInfoString(&buf, "remote_behind|");
if (reason & RS_SYNC_SKIP_DATA_LOSS)
appendStringInfoString(&buf, "data_loss|");
if (reason & RS_SYNC_SKIP_NO_SNAPSHOT)
appendStringInfoString(&buf, "no_snapshot|");/* Remove trailing '|' */
if (buf.len > 0 && buf.data[buf.len - 1] == '|')
buf.data[buf.len - 1] = '\0';return buf.data;
}--
Why are we showing the cause of the slot sync delay as an aggregate of
all causes occuring? I thought we should show the reason for the last
slot sync delay?
Step 4: Capture slot_sync_skip_reason whenever the relevant LOG messages are generated, primarily inside update_local_synced_slot or update_and_persist_local_synced_slot. This value will can later be persisted in the pg_replication_slots catalog.
--
Please let me know if you have any objections. I’ll share the wip patch in a few days.
--
I have attached a patch which I have worked on.
Thanks,
Shlok Kyal
Attachments:
v1-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v1-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 81c9e5e84302a49d44fd89cde15d4f172a752224 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Thu, 4 Sep 2025 19:46:24 +0530
Subject: [PATCH v1] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip and slot_sync_skip_reason to view
pg_stat_replication_view.
---
contrib/test_decoding/expected/stats.out | 12 +++---
doc/src/sgml/monitoring.sgml | 30 ++++++++++++++
src/backend/catalog/system_views.sql | 3 ++
src/backend/replication/logical/slotsync.c | 10 +++++
src/backend/utils/activity/pgstat_replslot.c | 36 +++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 42 ++++++++++++++++++--
src/include/catalog/pg_proc.dat | 6 +--
src/include/pgstat.h | 5 +++
src/include/replication/slotsync.h | 9 +++++
src/test/regress/expected/rules.out | 5 ++-
10 files changed, 144 insertions(+), 14 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index de6dc416130..aa75cdd458c 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | slot_sync_skip_reason | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | none |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | slot_sync_skip_reason | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | none |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..2760c2a7535 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1644,6 +1644,36 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot sync is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot sync was skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ Reason of the last slot sync skip.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77fa0234bb..8276c1af2eb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1061,6 +1061,9 @@ CREATE VIEW pg_stat_replication_slots AS
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
+ s.slot_sync_skip_reason,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9d0072a49ed..dc0a23ce506 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -218,6 +218,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ /* Update stats for slot sync skip */
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -595,6 +598,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /* Update stats for slot sync skip */
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
@@ -646,6 +652,10 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
+ /* Update stats for slot sync skip if slot exist on the standby */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_STANDBY_BEHIND);
+
return false;
}
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ccfb11c49bf..ae576516e44 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -27,6 +27,7 @@
#include "replication/slot.h"
#include "utils/pgstat_internal.h"
+#include "replication/slotsync.h"
static int get_replslot_index(const char *name, bool need_lock);
@@ -101,6 +102,41 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot, SlotSyncSkipReason reason)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ if (reason != SLOT_SYNC_SKIP_NONE)
+ {
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+ statent->slot_sync_skip_reason = reason;
+ }
+ else
+ {
+ statent->slot_sync_skip_count = 0;
+ statent->last_slot_sync_skip = 0;
+ statent->slot_sync_skip_reason = SLOT_SYNC_SKIP_NONE;
+ }
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..fe8feb87a3e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2093,6 +2093,26 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/* Map a SlotSyncSkipReason enum to a human-readable string */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SLOT_SYNC_SKIP_NONE:
+ return pstrdup("none");
+ case SLOT_SYNC_SKIP_REMOTE_BEHIND:
+ return pstrdup("remote_behind");
+ case SLOT_SYNC_SKIP_STANDBY_BEHIND:
+ return pstrdup("standby_behind");
+ case SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return pstrdup("no_consistent_snapshot");
+ }
+
+ Assert(false);
+ return pstrdup("none");
+}
+
/*
* Get the statistics for the replication slot. If the slot statistics is not
* available, return all-zeroes stats.
@@ -2100,7 +2120,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2129,7 +2149,13 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slot_sync_skip_reason",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2154,11 +2180,19 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[6] = Int64GetDatum(slotent->stream_bytes);
values[7] = Int64GetDatum(slotent->total_txns);
values[8] = Int64GetDatum(slotent->total_bytes);
+ values[9] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[10] = true;
+ else
+ values[10] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
+
+ values[11] = CStringGetTextDatum(GetSlotSyncSkipReason(slotent->slot_sync_skip_reason));
if (slotent->stat_reset_timestamp == 0)
- nulls[9] = true;
+ nulls[12] = true;
else
- values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..147b93c7a71 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5675,9 +5675,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,text,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,slot_sync_skip_reason,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f402b17295c..4d1b8fd79a4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -15,6 +15,7 @@
#include "portability/instr_time.h"
#include "postmaster/pgarch.h" /* for MAX_XFN_CHARS */
#include "replication/conflict.h"
+#include "replication/slotsync.h"
#include "utils/backend_progress.h" /* for backward compatibility */ /* IWYU pragma: export */
#include "utils/backend_status.h" /* for backward compatibility */ /* IWYU pragma: export */
#include "utils/pgstat_kind.h"
@@ -395,6 +396,9 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter stream_bytes;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
+ PgStat_Counter slot_sync_skip_reason;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -736,6 +740,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot, SlotSyncSkipReason reason);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slotsync.h b/src/include/replication/slotsync.h
index 16b721463dd..359435ff01e 100644
--- a/src/include/replication/slotsync.h
+++ b/src/include/replication/slotsync.h
@@ -23,6 +23,15 @@ extern PGDLLIMPORT bool sync_replication_slots;
extern PGDLLIMPORT char *PrimaryConnInfo;
extern PGDLLIMPORT char *PrimarySlotName;
+typedef enum SlotSyncSkipReason
+{
+ SLOT_SYNC_SKIP_NONE, /* No skip */
+ SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
+ SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
+ * consistent snapshot */
+} SlotSyncSkipReason;
+
extern char *CheckAndGetDbnameFromConninfo(void);
extern bool ValidateSlotSyncParams(int elevel);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..ae0291c06aa 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2140,9 +2140,12 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
+ s.slot_sync_skip_reason,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, slot_sync_skip_reason, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
--
2.34.1
Hi Shlok,
Good to hear that you’re also interested in working on this task.
On Thu, Sep 4, 2025 at 8:26 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
Hi Ashutosh,
I am also interested in this thread. And was working on a patch for it.
On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:
We have seen cases where slot synchronization gets delayed, for
example when the slot is behind the failover standby or vice versa, and the
slot sync worker has to wait for one to catch up with the other. During
this waiting period, users querying pg_replication_slots can only see
whether the slot has been synchronized or not. If it has already
synchronized, that’s fine, but if synchronization is taking longer, users
would naturally want to understand the reason for the delay.Is there a way for end users to know the cause of slot
synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are
not something end users would want to check regularly. Moreover, since
logging is configuration-based, relevant messages may sometimes be skipped
or suppressed.Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?I have similar thoughts, but for clarity, I’d like to outline some of
the key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was
skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind
local reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of
remote, risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a
consistent snapshot */
} ReplicationSlotSyncSkipReason;
--
I think we should also add the case when "remote_slot->confirmed_lsn >
latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot
is still not flushed on the Standby). In this case as well we are
skipping the slot sync.
Yes, we can include this case as well.
Step 2: Introduce new column to pg_replication_slots to store the skip
reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;--
As per the discussion [1], I think it is more of stat related data and
we should add it in the pg_stat_replication_slots view. Also we can
add columns for 'slot sync skip count' and 'last slot sync skip'.
Thoughts?
It’s not a bad choice, but what makes it a bit confusing for me is that
some of the slot sync information is stored in pg_replication_slots, while
some is in pg_stat_replication_slots.
Is there a possibility that when an end user queries pg_replication_slots,
it shows a particular slot as synced, but querying
pg_stat_replication_slots instead reveals a sync skip reason, or the other
way around?
Moreover, these views are primary data sources for end users, and the
information is critical for their operations. Splitting related information
across multiple views could increase the complexity of their queries.
Step 3: Function to convert enum to human-readable string that can be
stored in pg_replication_slots.
/*
* Convert ReplicationSlotSyncSkipReason bitmask to human-readablestring.
*
* Returns a palloc'd string; caller is responsible for freeing it.
*/
static char *
replication_slot_sync_skip_reason_str(ReplicationSlotSyncSkipReasonreason)
{
StringInfoData buf;
initStringInfo(&buf);if (reason == RS_SYNC_SKIP_NONE)
{
appendStringInfoString(&buf, "none");
return buf.data;
}if (reason & RS_SYNC_SKIP_REMOTE_BEHIND)
appendStringInfoString(&buf, "remote_behind|");
if (reason & RS_SYNC_SKIP_DATA_LOSS)
appendStringInfoString(&buf, "data_loss|");
if (reason & RS_SYNC_SKIP_NO_SNAPSHOT)
appendStringInfoString(&buf, "no_snapshot|");/* Remove trailing '|' */
if (buf.len > 0 && buf.data[buf.len - 1] == '|')
buf.data[buf.len - 1] = '\0';return buf.data;
}--
Why are we showing the cause of the slot sync delay as an aggregate of
all causes occuring? I thought we should show the reason for the last
slot sync delay?
Yes we should just be showing the reason for the last sync skip, no
aggregation is needed here.
Step 4: Capture slot_sync_skip_reason whenever the relevant LOG messages
are generated, primarily inside update_local_synced_slot or
update_and_persist_local_synced_slot. This value will can later be
persisted in the pg_replication_slots catalog.--
Please let me know if you have any objections. I’ll share the wip patch
in a few days.
--
I have attached a patch which I have worked on.
Thanks, I will look into it, in fact I have already looked into it, but
before I make any comments, I think maybe we should try to finalize the
approach first.
--
With Regards,
Ashutosh Sharma.
On Fri, Sep 5, 2025 at 12:50 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Good to hear that you’re also interested in working on this task.
On Thu, Sep 4, 2025 at 8:26 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
Hi Ashutosh,
I am also interested in this thread. And was working on a patch for it.
On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?I have similar thoughts, but for clarity, I’d like to outline some of the key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of remote, risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a consistent snapshot */
} ReplicationSlotSyncSkipReason;--
I think we should also add the case when "remote_slot->confirmed_lsn >
latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot
is still not flushed on the Standby). In this case as well we are
skipping the slot sync.Yes, we can include this case as well.
Step 2: Introduce new column to pg_replication_slots to store the skip reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;--
As per the discussion [1], I think it is more of stat related data and
we should add it in the pg_stat_replication_slots view. Also we can
add columns for 'slot sync skip count' and 'last slot sync skip'.
Thoughts?It’s not a bad choice, but what makes it a bit confusing for me is that some of the slot sync information is stored in pg_replication_slots, while some is in pg_stat_replication_slots.
How about keeping sync_skip_reason in pg_replication_slots and
sync_skip_count in pg_stat_replication_slots?
--
With Regards,
Amit Kapila.
Hi Amit,
On Sat, Sep 6, 2025 at 10:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 5, 2025 at 12:50 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Good to hear that you’re also interested in working on this task.
On Thu, Sep 4, 2025 at 8:26 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
Hi Ashutosh,
I am also interested in this thread. And was working on a patch for it.
On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?I have similar thoughts, but for clarity, I’d like to outline some of the key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of remote, risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a consistent snapshot */
} ReplicationSlotSyncSkipReason;--
I think we should also add the case when "remote_slot->confirmed_lsn >
latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot
is still not flushed on the Standby). In this case as well we are
skipping the slot sync.Yes, we can include this case as well.
Step 2: Introduce new column to pg_replication_slots to store the skip reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;--
As per the discussion [1], I think it is more of stat related data and
we should add it in the pg_stat_replication_slots view. Also we can
add columns for 'slot sync skip count' and 'last slot sync skip'.
Thoughts?It’s not a bad choice, but what makes it a bit confusing for me is that some of the slot sync information is stored in pg_replication_slots, while some is in pg_stat_replication_slots.
How about keeping sync_skip_reason in pg_replication_slots and
sync_skip_count in pg_stat_replication_slots?
I think we can do that, since sync_skip_reason appears to be a
descriptive metadata rather than statistical data like
slot_sync_skip_count and last_slot_sync_skip. However, it's also true
that all three pieces of data are transient by nature - they will just
be present in the runtime.
--
With Regards,
Ashutosh Sharma.
Hi Amit,
On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Sat, Sep 6, 2025 at 10:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 5, 2025 at 12:50 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Good to hear that you’re also interested in working on this task.
On Thu, Sep 4, 2025 at 8:26 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
Hi Ashutosh,
I am also interested in this thread. And was working on a patch for it.
On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover standby or vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting period, users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has already synchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the reason for the delay.
Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate actions to speed it up?
I understand that server logs are emitted in such cases, but logs are not something end users would want to check regularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or suppressed.
Currently, the way to see the reason for sync skip is LOGs but I think
it is better to add a new column like sync_skip_reason in
pg_replication_slots. This can show the reasons like
standby_LSN_ahead_remote_LSN. I think ideally users can compare
standby's slot LSN/XMIN with remote_slot being synced. Do you have any
better ideas?I have similar thoughts, but for clarity, I’d like to outline some of the key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of remote, risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a consistent snapshot */
} ReplicationSlotSyncSkipReason;--
I think we should also add the case when "remote_slot->confirmed_lsn >
latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot
is still not flushed on the Standby). In this case as well we are
skipping the slot sync.Yes, we can include this case as well.
Step 2: Introduce new column to pg_replication_slots to store the skip reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;--
As per the discussion [1], I think it is more of stat related data and
we should add it in the pg_stat_replication_slots view. Also we can
add columns for 'slot sync skip count' and 'last slot sync skip'.
Thoughts?It’s not a bad choice, but what makes it a bit confusing for me is that some of the slot sync information is stored in pg_replication_slots, while some is in pg_stat_replication_slots.
How about keeping sync_skip_reason in pg_replication_slots and
sync_skip_count in pg_stat_replication_slots?I think we can do that, since sync_skip_reason appears to be a
descriptive metadata rather than statistical data like
slot_sync_skip_count and last_slot_sync_skip. However, it's also true
that all three pieces of data are transient by nature - they will just
be present in the runtime.
After spending some more time on this, I found that maintaining
sync_skip_reason in pg_replication_slots would make the code changes a
bit messy and harder to maintain. I think storing all 3 pieces of
information - sync_skip_reason, sync_skip_count, and last_sync_skip in
pg_stat_replication_slots would be a cleaner solution. This way, all
the sync-related info stays together and the code remains
straightforward.
@Sholk, do let me know if you agree with this?
--
With Regards,
Ashutosh Sharma.
On Fri, Sep 12, 2025 at 1:07 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
I think we can do that, since sync_skip_reason appears to be a
descriptive metadata rather than statistical data like
slot_sync_skip_count and last_slot_sync_skip. However, it's also true
that all three pieces of data are transient by nature - they will just
be present in the runtime.After spending some more time on this, I found that maintaining
sync_skip_reason in pg_replication_slots would make the code changes a
bit messy and harder to maintain.
What exactly is your worry? It seems more logical to have
sync_skip_reason in pg_replication_slots and other two in
pg_stat_replication_slots as the latter is purely a stats view and the
sync_skip_count/last_sync_skip suits there better.
I think storing all 3 pieces of
information - sync_skip_reason, sync_skip_count, and last_sync_skip in
pg_stat_replication_slots would be a cleaner solution. This way, all
the sync-related info stays together and the code remains
straightforward.
Having all the sync information together has merit but don't you think
in this case the sync_skip_reason doesn't seem to be matching with the
existing columns in pg_stat_replication_slots?
--
With Regards,
Amit Kapila.
Hi Amit,
On Fri, Sep 12, 2025 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 12, 2025 at 1:07 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
I think we can do that, since sync_skip_reason appears to be a
descriptive metadata rather than statistical data like
slot_sync_skip_count and last_slot_sync_skip. However, it's also true
that all three pieces of data are transient by nature - they will just
be present in the runtime.After spending some more time on this, I found that maintaining
sync_skip_reason in pg_replication_slots would make the code changes a
bit messy and harder to maintain.What exactly is your worry? It seems more logical to have
sync_skip_reason in pg_replication_slots and other two in
pg_stat_replication_slots as the latter is purely a stats view and the
sync_skip_count/last_sync_skip suits there better.
The code changes for adding the skip reason to pg_replication_slots
feel a bit hacky compared to the approach for incorporating all three
pieces of information in pg_stat_replication_slots. I thought many
might prefer simplicity over hackiness, which is why having everything
in pg_stat_replication_slots could be more acceptable. That said, we
could maybe prepare a POC patch with this approach as well, compare
the two, and then decide which path to take.
--
With Regards,
Ashutosh Sharma.
Hi Amit, Ashutosh,
On Fri, 12 Sept 2025 at 17:28, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Fri, Sep 12, 2025 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 12, 2025 at 1:07 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
I think we can do that, since sync_skip_reason appears to be a
descriptive metadata rather than statistical data like
slot_sync_skip_count and last_slot_sync_skip. However, it's also true
that all three pieces of data are transient by nature - they will just
be present in the runtime.After spending some more time on this, I found that maintaining
sync_skip_reason in pg_replication_slots would make the code changes a
bit messy and harder to maintain.What exactly is your worry? It seems more logical to have
sync_skip_reason in pg_replication_slots and other two in
pg_stat_replication_slots as the latter is purely a stats view and the
sync_skip_count/last_sync_skip suits there better.The code changes for adding the skip reason to pg_replication_slots
feel a bit hacky compared to the approach for incorporating all three
pieces of information in pg_stat_replication_slots. I thought many
might prefer simplicity over hackiness, which is why having everything
in pg_stat_replication_slots could be more acceptable. That said, we
could maybe prepare a POC patch with this approach as well, compare
the two, and then decide which path to take.
Here are my thought on this:
I believe that if we decide to keep skip_reason in
pg_stat_replication_slot, it should mean "reason of last slot sync
skip" as I think it would make more sense in the sense of statistics.
And if we decide to keep skip_reason in pg_replication_slot, it would
be more appropriate to keep the latest slot data ( It should display
skip_reason only if the current slot sync cycle is skipped).
This is my observation based on the behaviour of current columns in
these views. Thoughts?
I have also attached POC patches for both approaches as per discussion above.
v2_approach1 : It adds all columns 'slot_sync_skip_reason',
'slot_sync_skip_count' and 'last_slot_sync_skip' to
pg_stat_replication_slots
v2_approach2 : It adds column 'slot_sync_skip_reason' to
pg_replication_slots and columns 'slot_sync_skip_count' and
'last_slot_sync_skip' to pg_stat_replication_slots
Thanks,
Shlok Kyal
Attachments:
v2_approach1-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v2_approach1-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 8349ab969ec7fa015a6d5e2d4fea6d8ebda74ba0 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Thu, 4 Sep 2025 19:46:24 +0530
Subject: [PATCH v2_approach1] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip and slot_sync_skip_reason to view
pg_stat_replication_view.
---
contrib/test_decoding/expected/stats.out | 12 +++---
doc/src/sgml/monitoring.sgml | 30 ++++++++++++++
src/backend/catalog/system_views.sql | 3 ++
src/backend/replication/logical/slotsync.c | 25 ++++++++++--
src/backend/utils/activity/pgstat_replslot.c | 23 +++++++++++
src/backend/utils/adt/pgstatfuncs.c | 42 ++++++++++++++++++--
src/include/catalog/pg_proc.dat | 6 +--
src/include/pgstat.h | 5 +++
src/include/replication/slotsync.h | 9 +++++
src/test/regress/expected/rules.out | 5 ++-
10 files changed, 143 insertions(+), 17 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index de6dc416130..aa75cdd458c 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | slot_sync_skip_reason | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | none |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | slot_sync_skip_reason | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | none |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..2760c2a7535 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1644,6 +1644,36 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot sync is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot sync was skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ Reason of the last slot sync skip.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77fa0234bb..8276c1af2eb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1061,6 +1061,9 @@ CREATE VIEW pg_stat_replication_slots AS
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
+ s.slot_sync_skip_reason,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..75bb6346cd4 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -218,6 +218,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ /* Update stats for slot sync skip */
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -595,6 +598,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /* Update stats for slot sync skip */
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
@@ -623,7 +629,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
static bool
synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
- ReplicationSlot *slot;
+ ReplicationSlot *slot = NULL;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
@@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
- return false;
+ /* If the slot is not present on the local */
+ if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ return false;
}
/* Search for the named slot */
- if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
@@ -658,6 +666,17 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
synced = slot->data.synced;
SpinLockRelease(&slot->mutex);
+ /*
+ * If standby is behind remote slot and the synced slot is present on
+ * local.
+ */
+ if (remote_slot->confirmed_lsn > latestFlushPtr)
+ {
+ if (synced)
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_STANDBY_BEHIND);
+ return false;
+ }
+
/* User-created slot with the same name exists, raise ERROR. */
if (!synced)
ereport(ERROR,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ccfb11c49bf..c43c59bcfc0 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -27,6 +27,7 @@
#include "replication/slot.h"
#include "utils/pgstat_internal.h"
+#include "replication/slotsync.h"
static int get_replslot_index(const char *name, bool need_lock);
@@ -101,6 +102,28 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot, SlotSyncSkipReason reason)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+ statent->slot_sync_skip_reason = reason;
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..fe8feb87a3e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2093,6 +2093,26 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/* Map a SlotSyncSkipReason enum to a human-readable string */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SLOT_SYNC_SKIP_NONE:
+ return pstrdup("none");
+ case SLOT_SYNC_SKIP_REMOTE_BEHIND:
+ return pstrdup("remote_behind");
+ case SLOT_SYNC_SKIP_STANDBY_BEHIND:
+ return pstrdup("standby_behind");
+ case SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return pstrdup("no_consistent_snapshot");
+ }
+
+ Assert(false);
+ return pstrdup("none");
+}
+
/*
* Get the statistics for the replication slot. If the slot statistics is not
* available, return all-zeroes stats.
@@ -2100,7 +2120,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2129,7 +2149,13 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slot_sync_skip_reason",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2154,11 +2180,19 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[6] = Int64GetDatum(slotent->stream_bytes);
values[7] = Int64GetDatum(slotent->total_txns);
values[8] = Int64GetDatum(slotent->total_bytes);
+ values[9] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[10] = true;
+ else
+ values[10] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
+
+ values[11] = CStringGetTextDatum(GetSlotSyncSkipReason(slotent->slot_sync_skip_reason));
if (slotent->stat_reset_timestamp == 0)
- nulls[9] = true;
+ nulls[12] = true;
else
- values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 03e82d28c87..2f95941f1ec 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5687,9 +5687,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,text,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,slot_sync_skip_reason,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f402b17295c..4d1b8fd79a4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -15,6 +15,7 @@
#include "portability/instr_time.h"
#include "postmaster/pgarch.h" /* for MAX_XFN_CHARS */
#include "replication/conflict.h"
+#include "replication/slotsync.h"
#include "utils/backend_progress.h" /* for backward compatibility */ /* IWYU pragma: export */
#include "utils/backend_status.h" /* for backward compatibility */ /* IWYU pragma: export */
#include "utils/pgstat_kind.h"
@@ -395,6 +396,9 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter stream_bytes;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
+ PgStat_Counter slot_sync_skip_reason;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -736,6 +740,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot, SlotSyncSkipReason reason);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slotsync.h b/src/include/replication/slotsync.h
index 16b721463dd..359435ff01e 100644
--- a/src/include/replication/slotsync.h
+++ b/src/include/replication/slotsync.h
@@ -23,6 +23,15 @@ extern PGDLLIMPORT bool sync_replication_slots;
extern PGDLLIMPORT char *PrimaryConnInfo;
extern PGDLLIMPORT char *PrimarySlotName;
+typedef enum SlotSyncSkipReason
+{
+ SLOT_SYNC_SKIP_NONE, /* No skip */
+ SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
+ SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
+ * consistent snapshot */
+} SlotSyncSkipReason;
+
extern char *CheckAndGetDbnameFromConninfo(void);
extern bool ValidateSlotSyncParams(int elevel);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..ae0291c06aa 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2140,9 +2140,12 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
+ s.slot_sync_skip_reason,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, slot_sync_skip_reason, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
--
2.34.1
v2_approach2-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v2_approach2-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 31adb46d5263cd91d358b178139172959c86a113 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v2_approach2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip to view pg_stat_replication_slots and new column
slot_sync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 ++--
doc/src/sgml/monitoring.sgml | 20 +++++++
doc/src/sgml/system-views.sgml | 8 +++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 61 +++++++++++++++++---
src/backend/replication/slotfuncs.c | 26 ++++++++-
src/backend/utils/activity/pgstat_replslot.c | 25 ++++++++
src/backend/utils/adt/pgstatfuncs.c | 18 ++++--
src/include/catalog/pg_proc.dat | 12 ++--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 17 ++++++
src/test/regress/expected/rules.out | 9 ++-
12 files changed, 188 insertions(+), 28 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index de6dc416130..c1ff872c08c 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..76bad0d7f2a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1644,6 +1644,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot sync is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot sync was skipped.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 4187191ea74..6b93c97ddd9 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3036,6 +3036,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ Reason of the last slot sync skip.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77fa0234bb..abca2f5f927 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1046,7 +1046,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slot_sync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1061,6 +1062,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..29fac502a8a 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,23 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
+{
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SLOT_SYNC_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
+
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -165,7 +182,8 @@ static void update_synced_slots_inactive_since(void);
static bool
update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
bool *found_consistent_snapshot,
- bool *remote_slot_precedes)
+ bool *remote_slot_precedes,
+ SlotSyncSkipReason * skip_reason)
{
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
@@ -218,6 +236,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ *skip_reason = SLOT_SYNC_SKIP_REMOTE_BEHIND;
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -557,7 +577,8 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
* false.
*/
static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+ SlotSyncSkipReason * skip_reason)
{
ReplicationSlot *slot = MyReplicationSlot;
bool found_consistent_snapshot = false;
@@ -565,7 +586,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
(void) update_local_synced_slot(remote_slot, remote_dbid,
&found_consistent_snapshot,
- &remote_slot_precedes);
+ &remote_slot_precedes, skip_reason);
/*
* Check if the primary server has caught up. Refer to the comment atop
@@ -595,6 +616,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ *skip_reason = SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT;
+
return false;
}
@@ -626,6 +649,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
ReplicationSlot *slot;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
+ SlotSyncSkipReason skip_reason = SLOT_SYNC_SKIP_NONE;
/*
* Make sure that concerned WAL is received and flushed before syncing
@@ -646,7 +670,11 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
- return false;
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ skip_reason = SLOT_SYNC_SKIP_STANDBY_BEHIND;
+ else
+ return false;
}
/* Search for the named slot */
@@ -658,6 +686,17 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
synced = slot->data.synced;
SpinLockRelease(&slot->mutex);
+ /*
+ * If standby is behind remote slot and the synced slot is present on
+ * local.
+ */
+ if (remote_slot->confirmed_lsn > latestFlushPtr)
+ {
+ if (synced)
+ update_slot_sync_skip_stats(slot, skip_reason);
+ return false;
+ }
+
/* User-created slot with the same name exists, raise ERROR. */
if (!synced)
ereport(ERROR,
@@ -715,7 +754,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
if (slot->data.persistency == RS_TEMPORARY)
{
slot_updated = update_and_persist_local_synced_slot(remote_slot,
- remote_dbid);
+ remote_dbid,
+ &skip_reason);
}
/* Slot ready for sync, so sync it. */
@@ -737,7 +777,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)));
slot_updated = update_local_synced_slot(remote_slot, remote_dbid,
- NULL, NULL);
+ NULL, NULL, &skip_reason);
}
}
/* Otherwise create the slot first. */
@@ -784,11 +824,18 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
ReplicationSlotsComputeRequiredXmin(true);
LWLockRelease(ProcArrayLock);
- update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+ update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+ &skip_reason);
slot_updated = true;
}
+ /*
+ * If slot sync is skipped update the reason and stats. Else reset the
+ * reason to 'none' on successful slot sync.
+ */
+ update_slot_sync_skip_stats(slot, skip_reason);
+
ReplicationSlotRelease();
return slot_updated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index b8f21153e7b..0033c03286f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SLOT_SYNC_SKIP_NONE:
+ return pstrdup("none");
+ case SLOT_SYNC_SKIP_REMOTE_BEHIND:
+ return pstrdup("remote_behind");
+ case SLOT_SYNC_SKIP_STANDBY_BEHIND:
+ return pstrdup("standby_behind");
+ case SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return pstrdup("no_consistent_snapshot");
+ }
+
+ Assert(false);
+ return pstrdup("none");
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slot_sync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ccfb11c49bf..bf436472b8d 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -101,6 +101,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..15500a77701 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2100,7 +2100,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 12
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2129,7 +2129,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2154,11 +2158,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[6] = Int64GetDatum(slotent->stream_bytes);
values[7] = Int64GetDatum(slotent->total_txns);
values[8] = Int64GetDatum(slotent->total_bytes);
+ values[9] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[10] = true;
+ else
+ values[10] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
if (slotent->stat_reset_timestamp == 0)
- nulls[9] = true;
+ nulls[11] = true;
else
- values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[11] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 03e82d28c87..d31a52b1fd4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5687,9 +5687,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11503,9 +11503,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slot_sync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f402b17295c..906d7b7799f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -395,6 +395,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter stream_bytes;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -736,6 +738,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index fe62162cde3..09932747bd3 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,20 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SLOT_SYNC_SKIP_NONE, /* No skip */
+ SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
+ SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
+ * consistent snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +263,9 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slot_sync_skip_reason;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..041d3328328 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1499,8 +1499,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slot_sync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slot_sync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2140,9 +2141,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
--
2.34.1
Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.
Below are my comments for your patch.
01.
```
+ Number of times the slot sync is skipped.
```
"slot sync" should be "slot synchronization". Same thing can be saied for other
attributes.
02.
```
+ Reason of the last slot sync skip.
```
Possible values must be clarified.
03.
```
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
+ s.slot_sync_skip_reason,
```
Last line has tab-blank but others have space-blank.
04.
```
+typedef enum SlotSyncSkipReason
+{
+ SLOT_SYNC_SKIP_NONE, /* No skip */
+ SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
+ SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
+ * consistent snapshot */
+} SlotSyncSkipReason
```
a.
Can we add comment atop the enum?
b.
SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
not received WALs required by slots, but enum does not clarify it.
How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
welcome.
c
s/reach/build/.
05.
```
@@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
- return false;
+ /* If the slot is not present on the local */
+ if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ return false;
}
```
Looks like if users try to sync slots via SQL interfaces, the statistics cannot
be updated.
06. update_slot_sync_skip_stats
```
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
```
I feel no need to update the reason if the slot->slot_sync_skip_reason
and skip_reason are the same.
07. synchronize_one_slot
```
+ /*
+ * If standby is behind remote slot and the synced slot is present on
+ * local.
+ */
+ if (remote_slot->confirmed_lsn > latestFlushPtr)
+ {
+ if (synced)
+ update_slot_sync_skip_stats(slot, skip_reason);
+ return false;
+ }
```
This condition exist in the same function; can we combine?
08. GetSlotSyncSkipReason()
Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
and it does not use.
09.
Can you consider a test for added codes?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Wed, Sep 17, 2025 at 4:24 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.
I also think so but Ashutosh thought that it would be hacky. Ashutosh,
did you have an opinion on this matter after seeing the patches?
--
With Regards,
Amit Kapila.
Hi Amit,
On Wed, Sep 17, 2025 at 5:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 4:24 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.I also think so but Ashutosh thought that it would be hacky. Ashutosh,
did you have an opinion on this matter after seeing the patches?
Yes, I’ve looked into both the patches. Approach 1 seems quite
straightforward. In approach 2, we need to pass some additional
arguments to update_local_sync_slot and
update_and_persist_local_synced_slot, which makes it feel a little
less clean compared to approach 1, where we simply add a new function
and call it directly. That said, this is just my view on code
cleanliness, I’m fine with proceeding with approach 2 if that’s
considered the better option.
--
With Regards,
Ashutosh Sharma,
On Wed, Sep 17, 2025 at 8:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Wed, Sep 17, 2025 at 5:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 4:24 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.I also think so but Ashutosh thought that it would be hacky. Ashutosh,
did you have an opinion on this matter after seeing the patches?Yes, I’ve looked into both the patches. Approach 1 seems quite
straightforward. In approach 2, we need to pass some additional
arguments to update_local_sync_slot and
update_and_persist_local_synced_slot, which makes it feel a little
less clean compared to approach 1, where we simply add a new function
and call it directly.
This is because the approach-1 doesn't show the latest value of
sync_status. I mean in the latest cycle if the sync is successful, it
won't update the stats which I am not sure is correct because users
may want to know the recent status of sync cycle. Otherwise, the patch
should be almost the same. I think we can even try to write a patch
for approach-2 without an additional out parameter in some of the
functions.
--
With Regards,
Amit Kapila.
Hi Amit,
On Thu, Sep 18, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 8:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Wed, Sep 17, 2025 at 5:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 4:24 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.I also think so but Ashutosh thought that it would be hacky. Ashutosh,
did you have an opinion on this matter after seeing the patches?Yes, I’ve looked into both the patches. Approach 1 seems quite
straightforward. In approach 2, we need to pass some additional
arguments to update_local_sync_slot and
update_and_persist_local_synced_slot, which makes it feel a little
less clean compared to approach 1, where we simply add a new function
and call it directly.This is because the approach-1 doesn't show the latest value of
sync_status. I mean in the latest cycle if the sync is successful, it
won't update the stats which I am not sure is correct because users
may want to know the recent status of sync cycle. Otherwise, the patch
should be almost the same.
This should be manageable, no? If we add an additional call to the
stats report function immediately after ReplicationSlotPersist(),
wouldn’t that address the issue? Please correct me if I’m overlooking
something.
@@ -600,6 +600,8 @@ update_and_persist_local_synced_slot(RemoteSlot
*remote_slot, Oid remote_dbid)
ReplicationSlotPersist();
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_NONE);
+
ereport(LOG,
errmsg("newly created replication slot \"%s\"
is sync-ready now",
remote_slot->name));
In addition to this, should anyone really need to query the skip
reason if pg_replication_slots already shows that the slot is synced
and not temporary? Ideally, users should check the slot status in
pg_replication_slots, and if it indicates the slot is persisted, there
seems little value in enquiring pg_stat_replication_slots for the skip
reason. That said, it’s important to ensure the information in both
views remains consistent.
I think we can even try to write a patch
for approach-2 without an additional out parameter in some of the
functions.
We can aim for this, if possible.
--
With Regards,
Ashutosh Sharma.
Hi,
@@ -646,7 +670,11 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
- return false;
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ skip_reason = SLOT_SYNC_SKIP_STANDBY_BEHIND;
+ else
+ return false;
With this change, you’re likely enforcing sync slot creation, whereas
earlier that might not have been the case. This introduces a
behavioral change, which may not be well received.
--
I think we can avoid passing skip_reason as a new argument to
update_local_synced_slot(). It only needs to be passed to
update_and_persist_local_synced_slot(). When
update_local_synced_slot() is invoked from within
update_and_persist_local_synced_slot(), we can simply rely on the two
flags, remote_slot_precedes and found_consistent_snapshot and set the
skip_reason accordingly, thoughts?
If update_local_synced_slot is being called from any other place that
means the slot is already persisted.
--
+typedef enum SlotSyncSkipReason
+{
+ SLOT_SYNC_SKIP_NONE, /* No skip */
+ SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
+ SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
+ * consistent snapshot */
+} SlotSyncSkipReason;
+
I would suggest shortening the enum names like maybe SS_SKIP_NONE
instead of SLOT_SYNC_SKIP_NONE.
--
With Regards,
Ashutosh Sharma.
On Thu, 18 Sept 2025 at 13:17, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Amit,
On Thu, Sep 18, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 8:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Wed, Sep 17, 2025 at 5:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 4:24 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.I also think so but Ashutosh thought that it would be hacky. Ashutosh,
did you have an opinion on this matter after seeing the patches?Yes, I’ve looked into both the patches. Approach 1 seems quite
straightforward. In approach 2, we need to pass some additional
arguments to update_local_sync_slot and
update_and_persist_local_synced_slot, which makes it feel a little
less clean compared to approach 1, where we simply add a new function
and call it directly.This is because the approach-1 doesn't show the latest value of
sync_status. I mean in the latest cycle if the sync is successful, it
won't update the stats which I am not sure is correct because users
may want to know the recent status of sync cycle. Otherwise, the patch
should be almost the same.This should be manageable, no? If we add an additional call to the
stats report function immediately after ReplicationSlotPersist(),
wouldn’t that address the issue? Please correct me if I’m overlooking
something.@@ -600,6 +600,8 @@ update_and_persist_local_synced_slot(RemoteSlot
*remote_slot, Oid remote_dbid)ReplicationSlotPersist();
+ pgstat_report_replslot_sync_skip(slot, SLOT_SYNC_SKIP_NONE); + ereport(LOG, errmsg("newly created replication slot \"%s\" is sync-ready now", remote_slot->name));
Currently, in this code change, the skip_reason is updated to 'none'
only when the slot state changes from temporary to persistent.
However, this logic does not handle cases where the slot is already a
persistent sync slot.
I believe we should also sync skip can happen for persistent slots.
That means, on a successful slot sync, we should update the
skip_reason to 'none' even for slots that are already persistent.
In addition to this, should anyone really need to query the skip
reason if pg_replication_slots already shows that the slot is synced
and not temporary? Ideally, users should check the slot status in
pg_replication_slots, and if it indicates the slot is persisted, there
seems little value in enquiring pg_stat_replication_slots for the skip
reason. That said, it’s important to ensure the information in both
views remains consistent.
I have a doubt. Why don't we want to report the sync skip reason once
the slots are persisted?
for the case:
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
*/
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("skipping slot synchronization because the
received slot sync"
" LSN %X/%08X for slot \"%s\" is ahead of the
standby position %X/%08X",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
return false;
}
Slot sync skip can happen even for persistent slots. So why should we
avoid displaying the skip reason in such cases?
I checked that if the synchronized_standby_slots GUC is not set
properly, we can hit this condition even for persistent slots.
I think we should still display the skip reason if the user does not
configure this GUC as expected.
I think we can even try to write a patch
for approach-2 without an additional out parameter in some of the
functions.We can aim for this, if possible.
Thanks,
Shlok Kyal
On Mon, Sep 22, 2025 at 3:41 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Thu, 18 Sept 2025 at 13:17, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
In addition to this, should anyone really need to query the skip
reason if pg_replication_slots already shows that the slot is synced
and not temporary? Ideally, users should check the slot status in
pg_replication_slots, and if it indicates the slot is persisted, there
seems little value in enquiring pg_stat_replication_slots for the skip
reason. That said, it’s important to ensure the information in both
views remains consistent.I have a doubt. Why don't we want to report the sync skip reason once
the slots are persisted?
for the case:latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
*/
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("skipping slot synchronization because the
received slot sync"
" LSN %X/%08X for slot \"%s\" is ahead of the
standby position %X/%08X",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));return false;
}Slot sync skip can happen even for persistent slots. So why should we
avoid displaying the skip reason in such cases?
We should display the skip reason even for persistent slots and clear
the same after a successful sync.
--
With Regards,
Amit Kapila.
On Wed, 17 Sept 2025 at 16:24, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.
I agree with your point. I think the preferred behaviour of
slot_sync_skip_reason is to indicate the current status of
synchronization. And taking account the current behaviour of columns
of views pg_replication_slots and pg_stat_replication_slots, I think
slot_sync_skip_reason is suited for pg_replication_slots view and
I am proceeding ahead with approach2
Below are my comments for your patch.
01.
```
+ Number of times the slot sync is skipped.
```"slot sync" should be "slot synchronization". Same thing can be saied for other
attributes.02.
```
+ Reason of the last slot sync skip.
```Possible values must be clarified.
03. ``` + s.slot_sync_skip_count, + s.last_slot_sync_skip, + s.slot_sync_skip_reason, ```Last line has tab-blank but others have space-blank.
04. ``` +typedef enum SlotSyncSkipReason +{ + SLOT_SYNC_SKIP_NONE, /* No skip */ + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */ + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */ + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a + * consistent snapshot */ +} SlotSyncSkipReason ```a.
Can we add comment atop the enum?b.
SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
not received WALs required by slots, but enum does not clarify it.
How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
welcome.c
s/reach/build/.05.
```
@@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));- return false; + /* If the slot is not present on the local */ + if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true))) + return false; } ```Looks like if users try to sync slots via SQL interfaces, the statistics cannot
be updated.06. update_slot_sync_skip_stats ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ```I feel no need to update the reason if the slot->slot_sync_skip_reason
and skip_reason are the same.07. synchronize_one_slot ``` + /* + * If standby is behind remote slot and the synced slot is present on + * local. + */ + if (remote_slot->confirmed_lsn > latestFlushPtr) + { + if (synced) + update_slot_sync_skip_stats(slot, skip_reason); + return false; + } ```This condition exist in the same function; can we combine?
08. GetSlotSyncSkipReason()
Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
and it does not use.
I check similar functions and I think we can remove pstrup. Removed in
the latest patch.
09.
Can you consider a test for added codes?
Added test in 0002 patch
I have also addressed remaining comments and attached the latest
version of patch.
Thanks,
Shlok Kyal
Attachments:
v3-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v3-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From c248365ab0cc1dee40e236b3432306dd52ffca84 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v3 1/2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip to view pg_stat_replication_slots and new column
slot_sync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 ++--
doc/src/sgml/monitoring.sgml | 20 +++++++
doc/src/sgml/system-views.sgml | 8 +++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 59 +++++++++++++++++++-
src/backend/replication/slotfuncs.c | 26 ++++++++-
src/backend/utils/activity/pgstat_replslot.c | 25 +++++++++
src/backend/utils/adt/pgstatfuncs.c | 18 ++++--
src/include/catalog/pg_proc.dat | 12 ++--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 18 ++++++
src/test/regress/expected/rules.out | 9 ++-
src/tools/pgindent/typedefs.list | 1 +
13 files changed, 193 insertions(+), 23 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index de6dc416130..c1ff872c08c 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..65b1a8773f0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1644,6 +1644,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 4187191ea74..72e9b6de1e8 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3036,6 +3036,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ Reason of the last slot synchronization skip.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77fa0234bb..abca2f5f927 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1046,7 +1046,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slot_sync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1061,6 +1062,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..6259fad894c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,24 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
+{
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
+
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -218,6 +236,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ update_slot_sync_skip_stats(slot, SS_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -337,6 +357,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlotsComputeRequiredLSN();
}
+ /*
+ * If found_consistent_snapshot is not NULL and a consistent snapshot is
+ * found set the slot sync skip reason to none. Else, if consistent
+ * snapshot is not found the stats will be updated in the function
+ * update_and_persist_local_synced_slot
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+
return updated_config || updated_xmin_or_lsn;
}
@@ -580,6 +609,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -595,11 +627,21 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /*
+ * If a consitent snapshot is not found, update the slot sync skip
+ * stats
+ */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -623,7 +665,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
static bool
synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
- ReplicationSlot *slot;
+ ReplicationSlot *slot = NULL;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
@@ -634,6 +676,19 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ update_slot_sync_skip_stats(slot, SS_SKIP_MISSING_WAL_RECORD);
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -650,7 +705,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
}
/* Search for the named slot */
- if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index b8f21153e7b..4e03205c63b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_MISSING_WAL_RECORD:
+ return "missing_wal_record";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slot_sync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ccfb11c49bf..bf436472b8d 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -101,6 +101,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..15500a77701 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2100,7 +2100,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 12
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2129,7 +2129,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2154,11 +2158,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[6] = Int64GetDatum(slotent->stream_bytes);
values[7] = Int64GetDatum(slotent->total_txns);
values[8] = Int64GetDatum(slotent->total_bytes);
+ values[9] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[10] = true;
+ else
+ values[10] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
if (slotent->stat_reset_timestamp == 0)
- nulls[9] = true;
+ nulls[11] = true;
else
- values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[11] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 01eba3b5a19..649a6c0f78c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5687,9 +5687,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11503,9 +11503,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slot_sync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f402b17295c..906d7b7799f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -395,6 +395,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter stream_bytes;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -736,6 +738,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index fe62162cde3..013f4bae942 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,21 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_MISSING_WAL_RECORD, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not build a consistent
+ * snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +264,9 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slot_sync_skip_reason;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..041d3328328 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1499,8 +1499,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slot_sync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slot_sync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2140,9 +2141,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..3509f40875e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2788,6 +2788,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
v3-0002-Add-test-for-new-stats-for-slot-sync-skip.patchapplication/octet-stream; name=v3-0002-Add-test-for-new-stats-for-slot-sync-skip.patchDownload
From 16702f41408fec8d7a35868782fa6ca68226f715 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 24 Sep 2025 15:26:37 +0530
Subject: [PATCH v3 2/2] Add test for new stats for slot sync skip
---
src/backend/replication/logical/slotsync.c | 5 +
src/test/recovery/meson.build | 1 +
src/test/recovery/t/049_slot_skip_stats.pl | 184 +++++++++++++++++++++
3 files changed, 190 insertions(+)
create mode 100644 src/test/recovery/t/049_slot_skip_stats.pl
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 6259fad894c..4359120165e 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -994,6 +995,10 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("slot-sync-skip", NULL);
+#endif
+
return some_slot_updated;
}
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 52993c32dbb..640cbb2b796 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -57,6 +57,7 @@ tests += {
't/046_checkpoint_logical_slot.pl',
't/047_checkpoint_physical_slot.pl',
't/048_vacuum_horizon_floor.pl'
+ 't/049_slot_skip_stats.pl'
],
},
}
diff --git a/src/test/recovery/t/049_slot_skip_stats.pl b/src/test/recovery/t/049_slot_skip_stats.pl
new file mode 100644
index 00000000000..21f4859f9c1
--- /dev/null
+++ b/src/test/recovery/t/049_slot_skip_stats.pl
@@ -0,0 +1,184 @@
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(
+ allows_streaming => 'logical',
+ auth_extra => [ '--create-role' => 'repl_role' ]);
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
+$primary->start;
+
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup(
+ $primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+
+my $connstr_1 = $primary->connstr;
+$standby1->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr_1 dbname=postgres'
+log_min_messages = 'debug2'
+));
+$primary->psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby1->start;
+
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+$primary->wait_for_replay_catchup($standby1);
+
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+my $result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason
+ FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync'
+ AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+
+
+# Change pg_hba.conf so that standby cannot connect to primary
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf('pg_hba.conf',
+ qq{local all all trust}
+);
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 values(1);
+));
+$primary->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());");
+
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Confirm that standby is behind
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason
+ FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync'
+ AND synced"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_count
+ FROM pg_stat_replication_slots
+ WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat pg_sync_replication_slots to check slot_sync_skip_count is advancing
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason
+ FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync'
+ AND synced"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_count
+ FROM pg_stat_replication_slots
+ WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore the connect between primary and standby
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf',
+ qq{
+local all all trust
+local replication all trust
+});
+$primary->restart;
+
+# Cleanup
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby1);
+
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Create a new logical slot on primary
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point
+my $standby_psql = $standby1->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# initiate sync the failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+$standby1->wait_for_event('client backend', 'slot-sync-skip');
+
+# the logical slot is in temporary state and the sync will skip as remote is
+# behind the freashly created slot
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason
+ FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync'
+ AND synced"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_count
+ FROM pg_stat_replication_slots
+ WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip cout");
+
+$standby1->safe_psql('postgres',
+ q{select injection_points_detach('slot-sync-skip')});
+$standby1->safe_psql('postgres',
+ q{select injection_points_wakeup('slot-sync-skip')});
+
+done_testing();
--
2.34.1
On Thu, 18 Sept 2025 at 11:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 8:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Wed, Sep 17, 2025 at 5:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 17, 2025 at 4:24 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.I also think so but Ashutosh thought that it would be hacky. Ashutosh,
did you have an opinion on this matter after seeing the patches?Yes, I’ve looked into both the patches. Approach 1 seems quite
straightforward. In approach 2, we need to pass some additional
arguments to update_local_sync_slot and
update_and_persist_local_synced_slot, which makes it feel a little
less clean compared to approach 1, where we simply add a new function
and call it directly.This is because the approach-1 doesn't show the latest value of
sync_status. I mean in the latest cycle if the sync is successful, it
won't update the stats which I am not sure is correct because users
may want to know the recent status of sync cycle. Otherwise, the patch
should be almost the same. I think we can even try to write a patch
for approach-2 without an additional out parameter in some of the
functions.
Hi Amit,
I have written a patch which removes passing this extra parameter to
the functions.
I have attached the latest patch in [1]/messages/by-id/CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com.
[1]: /messages/by-id/CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com
Thanks,
Shlok Kyal
Hi Ashutosh,
Thanks for reviewing the patch.
On Mon, 22 Sept 2025 at 10:59, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi,
@@ -646,7 +670,11 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));- return false; + /* If slot is present on the local, update the slot sync skip stats */ + if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) + skip_reason = SLOT_SYNC_SKIP_STANDBY_BEHIND; + else + return false;With this change, you’re likely enforcing sync slot creation, whereas
earlier that might not have been the case. This introduces a
behavioral change, which may not be well received.--
I have fixed it in latest version of patch
I think we can avoid passing skip_reason as a new argument to
update_local_synced_slot(). It only needs to be passed to
update_and_persist_local_synced_slot(). When
update_local_synced_slot() is invoked from within
update_and_persist_local_synced_slot(), we can simply rely on the two
flags, remote_slot_precedes and found_consistent_snapshot and set the
skip_reason accordingly, thoughts?If update_local_synced_slot is being called from any other place that
means the slot is already persisted.
I came up with a solution on similar lines as above in the attached patch.
And also as per Amit's comment in [1]/messages/by-id/CAA4eK1+6UsZiN0GnRNue_Vs8007jdcDFetNq+apubHcrqzjwpQ@mail.gmail.com, I have kept the behaviour to
show skip reason even for persistent slots.
--
+typedef enum SlotSyncSkipReason +{ + SLOT_SYNC_SKIP_NONE, /* No skip */ + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */ + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */ + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a + * consistent snapshot */ +} SlotSyncSkipReason; +I would suggest shortening the enum names like maybe SS_SKIP_NONE
instead of SLOT_SYNC_SKIP_NONE.
Fixed it.
I have attached the latest patch [2]/messages/by-id/CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com.
[1]: /messages/by-id/CAA4eK1+6UsZiN0GnRNue_Vs8007jdcDFetNq+apubHcrqzjwpQ@mail.gmail.com
[2]: /messages/by-id/CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com
Thanks,
Shlok Kyal
On Wed, 24 Sept 2025 at 16:39, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Wed, 17 Sept 2025 at 16:24, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.I agree with your point. I think the preferred behaviour of
slot_sync_skip_reason is to indicate the current status of
synchronization. And taking account the current behaviour of columns
of views pg_replication_slots and pg_stat_replication_slots, I think
slot_sync_skip_reason is suited for pg_replication_slots view and
I am proceeding ahead with approach2Below are my comments for your patch.
01.
```
+ Number of times the slot sync is skipped.
```"slot sync" should be "slot synchronization". Same thing can be saied for other
attributes.02.
```
+ Reason of the last slot sync skip.
```Possible values must be clarified.
03. ``` + s.slot_sync_skip_count, + s.last_slot_sync_skip, + s.slot_sync_skip_reason, ```Last line has tab-blank but others have space-blank.
04. ``` +typedef enum SlotSyncSkipReason +{ + SLOT_SYNC_SKIP_NONE, /* No skip */ + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */ + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */ + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a + * consistent snapshot */ +} SlotSyncSkipReason ```a.
Can we add comment atop the enum?b.
SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
not received WALs required by slots, but enum does not clarify it.
How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
welcome.c
s/reach/build/.05.
```
@@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));- return false; + /* If the slot is not present on the local */ + if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true))) + return false; } ```Looks like if users try to sync slots via SQL interfaces, the statistics cannot
be updated.06. update_slot_sync_skip_stats ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ```I feel no need to update the reason if the slot->slot_sync_skip_reason
and skip_reason are the same.07. synchronize_one_slot ``` + /* + * If standby is behind remote slot and the synced slot is present on + * local. + */ + if (remote_slot->confirmed_lsn > latestFlushPtr) + { + if (synced) + update_slot_sync_skip_stats(slot, skip_reason); + return false; + } ```This condition exist in the same function; can we combine?
08. GetSlotSyncSkipReason()
Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
and it does not use.I check similar functions and I think we can remove pstrup. Removed in
the latest patch.09.
Can you consider a test for added codes?
Added test in 0002 patch
I have also addressed remaining comments and attached the latest
version of patch.
The CF Bot was failing as meson.build was not formatted appropriately.
I have fixed it and made some cosmetic changes in the test file.
I ran the CF tests on my local repository and it is passing all tests.
I have attached the updated version.
Thanks,
Shlok Kyal
Attachments:
v4-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v4-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 99c8350040a14b38eeebf3e44d2b190fe1c44889 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v4 1/2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip to view pg_stat_replication_slots and new column
slot_sync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 ++--
doc/src/sgml/monitoring.sgml | 20 +++++++
doc/src/sgml/system-views.sgml | 8 +++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 59 +++++++++++++++++++-
src/backend/replication/slotfuncs.c | 26 ++++++++-
src/backend/utils/activity/pgstat_replslot.c | 25 +++++++++
src/backend/utils/adt/pgstatfuncs.c | 18 ++++--
src/include/catalog/pg_proc.dat | 12 ++--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 18 ++++++
src/test/regress/expected/rules.out | 9 ++-
src/tools/pgindent/typedefs.list | 1 +
13 files changed, 193 insertions(+), 23 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index de6dc416130..c1ff872c08c 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..65b1a8773f0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1644,6 +1644,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 4187191ea74..72e9b6de1e8 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3036,6 +3036,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ Reason of the last slot synchronization skip.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77fa0234bb..abca2f5f927 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1046,7 +1046,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slot_sync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1061,6 +1062,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..6259fad894c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,24 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
+{
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
+
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -218,6 +236,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ update_slot_sync_skip_stats(slot, SS_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -337,6 +357,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlotsComputeRequiredLSN();
}
+ /*
+ * If found_consistent_snapshot is not NULL and a consistent snapshot is
+ * found set the slot sync skip reason to none. Else, if consistent
+ * snapshot is not found the stats will be updated in the function
+ * update_and_persist_local_synced_slot
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+
return updated_config || updated_xmin_or_lsn;
}
@@ -580,6 +609,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -595,11 +627,21 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /*
+ * If a consitent snapshot is not found, update the slot sync skip
+ * stats
+ */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -623,7 +665,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
static bool
synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
- ReplicationSlot *slot;
+ ReplicationSlot *slot = NULL;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
@@ -634,6 +676,19 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ update_slot_sync_skip_stats(slot, SS_SKIP_MISSING_WAL_RECORD);
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -650,7 +705,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
}
/* Search for the named slot */
- if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index b8f21153e7b..4e03205c63b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_MISSING_WAL_RECORD:
+ return "missing_wal_record";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slot_sync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ccfb11c49bf..bf436472b8d 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -101,6 +101,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..15500a77701 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2100,7 +2100,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 12
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2129,7 +2129,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2154,11 +2158,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[6] = Int64GetDatum(slotent->stream_bytes);
values[7] = Int64GetDatum(slotent->total_txns);
values[8] = Int64GetDatum(slotent->total_bytes);
+ values[9] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[10] = true;
+ else
+ values[10] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
if (slotent->stat_reset_timestamp == 0)
- nulls[9] = true;
+ nulls[11] = true;
else
- values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[11] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 01eba3b5a19..649a6c0f78c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5687,9 +5687,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11503,9 +11503,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slot_sync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index e4a59a30b8c..e9a506fa5bb 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -396,6 +396,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter stream_bytes;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -737,6 +739,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index fe62162cde3..013f4bae942 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,21 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_MISSING_WAL_RECORD, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not build a consistent
+ * snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +264,9 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slot_sync_skip_reason;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..041d3328328 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1499,8 +1499,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slot_sync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slot_sync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2140,9 +2141,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.stream_bytes,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..3509f40875e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2788,6 +2788,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
v4-0002-Add-test-for-new-stats-for-slot-sync-skip.patchapplication/octet-stream; name=v4-0002-Add-test-for-new-stats-for-slot-sync-skip.patchDownload
From 541bd745ea85cc0409a166eec98c0e9a4dd0868a Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 24 Sep 2025 15:26:37 +0530
Subject: [PATCH v4 2/2] Add test for new stats for slot sync skip
---
src/backend/replication/logical/slotsync.c | 5 +
src/test/recovery/meson.build | 3 +-
src/test/recovery/t/049_slot_skip_stats.pl | 180 +++++++++++++++++++++
3 files changed, 187 insertions(+), 1 deletion(-)
create mode 100644 src/test/recovery/t/049_slot_skip_stats.pl
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 6259fad894c..4359120165e 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -994,6 +995,10 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("slot-sync-skip", NULL);
+#endif
+
return some_slot_updated;
}
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 52993c32dbb..83a6c4b5c17 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -56,7 +56,8 @@ tests += {
't/045_archive_restartpoint.pl',
't/046_checkpoint_logical_slot.pl',
't/047_checkpoint_physical_slot.pl',
- 't/048_vacuum_horizon_floor.pl'
+ 't/048_vacuum_horizon_floor.pl',
+ 't/049_slot_skip_stats.pl'
],
},
}
diff --git a/src/test/recovery/t/049_slot_skip_stats.pl b/src/test/recovery/t/049_slot_skip_stats.pl
new file mode 100644
index 00000000000..3aa63911207
--- /dev/null
+++ b/src/test/recovery/t/049_slot_skip_stats.pl
@@ -0,0 +1,180 @@
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(
+ allows_streaming => 'logical',
+ auth_extra => [ '--create-role' => 'repl_role' ]);
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
+$primary->start;
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup(
+ $primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+
+my $connstr_1 = $primary->connstr;
+$standby1->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr_1 dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby1->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby1);
+
+# Initial sync of replication slots
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify that initially there is no skip reason
+my $result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+
+# Simulate standby connection failure by modifying pg_hba.conf
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf('pg_hba.conf',
+ qq{local all all trust}
+);
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+));
+$primary->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());");
+
+# Attempt to sync replication slots while standby is behind
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Check skip reason and count when standby is behind
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby1->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby1->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore connectivity between primary and standby
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf',
+ qq{
+local all all trust
+local replication all trust
+});
+$primary->restart;
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby1);
+
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Create a new logical slot for testing injection point
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+my $standby_psql = $standby1->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby1->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
+$result = $standby1->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby1->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+done_testing();
--
2.34.1
Dear Shlok,
Thanks for updating the patch. Here are my comments.
01.
```
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
```
Per my understanding, spinlock is acquired when the attribute on the shared memory
is updating. Can you check other parts and follow the rukle?
02.
```
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
```
Same as 1.
03.
```
```
04.
```
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("slot-sync-skip", NULL);
+#endif
```
No need do do #ifdef here. INJECTION_POINT() itself checks internally.
05.
```
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(
+ allows_streaming => 'logical',
+ auth_extra => [ '--create-role' => 'repl_role' ]);
```
Do we have to create repl_role? I'm not sure where it's used.
06.
```
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
```
Do we have to set max_prepared_transactions? PREPARE command is not used.
07.
```
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points))
```
We must check whether the injection_points is available or not. See 047_checkpoint_physical_slot.pl.
08.
```
+# Initialize standby from primary backup
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup(
+ $primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
```
To clarify, is there a reason why we set has_restoring? Can we remove it?
09.
```
+my $connstr_1 = $primary->connstr;
```
Since this is an only connection string in the test, suffix _1 is not needed.
10.
```
+# Simulate standby connection failure by modifying pg_hba.conf
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf('pg_hba.conf',
+ qq{local all all trust}
+);
```
What if the system does not have Unix domain socket? I'm afraid all connections
could be brocked in this case.
11.
```
+# Attempt to sync replication slots while standby is behind
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
```
When the command can be failed, can you set a error message from the command to
a variable? Otherwise an ERROR is output to the result file, which is suprising.
```
psql:<stdin>:1: ERROR: skipping slot synchronization because the received slot sync LSN 0/03019E58 for slot "slot_sync" is ahead of the standby position 0/030000D8
[17:23:21.495](0.384s) ok 2 - slot sync skip when standby is behind
```
12.
```
+# Restore connectivity between primary and standby
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf',
+ qq{
+local all all trust
+local replication all trust
+});
```
Same as 10. Also, no need to do unlink() here.
13.
```
+# Create a new logical slot for testing injection point
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
```
Before here can you add a description what you would test?
14.
```
# Create a physical replication slot on primary for standby
$primary->psql('postgres',
q{SELECT pg_create_physical_replication_slot('sb1_slot');});
```
Use safe_psql instead of psql().
15.
```
- if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true)))
```
Is there a possibility that slot is not NULL? It is used when confirmed_flush exceeds
the latestFlushPtr, but in this case the function cannot reach here.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Shlok,
Thanks for updating the patch. Here are my comments.
I found one more comment.
```
+ /*
+ * If found_consistent_snapshot is not NULL and a consistent snapshot is
+ * found set the slot sync skip reason to none. Else, if consistent
+ * snapshot is not found the stats will be updated in the function
+ * update_and_persist_local_synced_slot
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
```
I think the condition is confusing; in code level there is a path that
found_consistent_snapshot is NULL but synchronization happened. (Not sure it is
possible though).
I think it is better to put update_slot_sync_skip_stats() near the sync part.
If the snapshot exists from the beginning, it can be done unconditionally,
otherwise we can check again. Attached .diffs file implements it.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
kuroda.diffsapplication/octet-stream; name=kuroda.diffsDownload
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 4359120165e..94063f67369 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -282,6 +282,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin = remote_slot->catalog_xmin;
SpinLockRelease(&slot->mutex);
+ /* Synchronization happened, update the slot sync skip reason */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+
if (found_consistent_snapshot)
*found_consistent_snapshot = true;
}
@@ -290,6 +293,14 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
found_consistent_snapshot);
+ /* Update the slot sync skip reason if snapshot could be created */
+ if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
+ {
+ Assert(!found_consistent_snapshot ||
+ *found_consistent_snapshot);
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+ }
+
/* Sanity check */
if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
ereport(ERROR,
@@ -358,15 +369,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlotsComputeRequiredLSN();
}
- /*
- * If found_consistent_snapshot is not NULL and a consistent snapshot is
- * found set the slot sync skip reason to none. Else, if consistent
- * snapshot is not found the stats will be updated in the function
- * update_and_persist_local_synced_slot
- */
- if (!found_consistent_snapshot || *found_consistent_snapshot)
- update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
-
return updated_config || updated_xmin_or_lsn;
}
On Tue, 30 Sept 2025 at 18:22, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
Thanks for updating the patch. Here are my comments.
01. ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + if (slot->slot_sync_skip_reason != skip_reason) + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ```Per my understanding, spinlock is acquired when the attribute on the shared memory
is updating. Can you check other parts and follow the rukle?02. ``` + SpinLockAcquire(&slot->mutex); + synced = slot->data.synced; + SpinLockRelease(&slot->mutex); ```Same as 1.
I checked and found the following comment:
* - Individual fields are protected by mutex where only the backend owning
* the slot is authorized to update the fields from its own slot. The
* backend owning the slot does not need to take this lock when reading its
* own fields, while concurrent backends not owning this slot should take the
* lock when reading this slot's data.
So for the above two cases we are updating the
'slot->slot_sync_skip_reason' and reading 'slot->data.synced' and this
can happen before the slot sync worker acquires the slot or owns the
slot.
Also in the same code at a later stage we are again checking the
synced flag and we do that while holding a spin lock. Based on these
observations I think we should take Spinlock in both cases.
03.
```
```04. ``` +#ifdef USE_INJECTION_POINTS + INJECTION_POINT("slot-sync-skip", NULL); +#endif ```No need do do #ifdef here. INJECTION_POINT() itself checks internally.
Fixed
05. ``` +# Initialize the primary cluster +my $primary = PostgreSQL::Test::Cluster->new('publisher'); +$primary->init( + allows_streaming => 'logical', + auth_extra => [ '--create-role' => 'repl_role' ]); ```Do we have to create repl_role? I'm not sure where it's used.
It is not needed, I have removed it.
06. ``` +$primary->append_conf( + 'postgresql.conf', qq{ +autovacuum = off +max_prepared_transactions = 1 +}); ```Do we have to set max_prepared_transactions? PREPARE command is not used.
It is not needed. Removed it.
07. ``` +# Load the injection_points extension +$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points)) ```We must check whether the injection_points is available or not. See 047_checkpoint_physical_slot.pl.
Added the check
08. ``` +# Initialize standby from primary backup +my $standby1 = PostgreSQL::Test::Cluster->new('standby1'); +$standby1->init_from_backup( + $primary, $backup_name, + has_streaming => 1, + has_restoring => 1); ```To clarify, is there a reason why we set has_restoring? Can we remove it?
It is not needed. Removed it.
09.
```
+my $connstr_1 = $primary->connstr;
```Since this is an only connection string in the test, suffix _1 is not needed.
Fixed
10. ``` +# Simulate standby connection failure by modifying pg_hba.conf +unlink($primary->data_dir . '/pg_hba.conf'); +$primary->append_conf('pg_hba.conf', + qq{local all all trust} +); ```What if the system does not have Unix domain socket? I'm afraid all connections
could be brocked in this case.
I have used an injection point to simulate this scenario instead of
changing the contents of pg_hba.conf files.
11. ``` +# Attempt to sync replication slots while standby is behind +$standby1->psql('postgres', "SELECT pg_sync_replication_slots();"); ```When the command can be failed, can you set a error message from the command to
a variable? Otherwise an ERROR is output to the result file, which is suprising.```
psql:<stdin>:1: ERROR: skipping slot synchronization because the received slot sync LSN 0/03019E58 for slot "slot_sync" is ahead of the standby position 0/030000D8
[17:23:21.495](0.384s) ok 2 - slot sync skip when standby is behind
```
In the existing tests I find similar cases where ERROR is output to
the result file.
For example in recovery/001_stream file:
[10:33:55.443](0.099s) ok 5 - pg_sequence_last_value() on unlogged
sequence on standby 1
psql:<stdin>:1: ERROR: cannot execute INSERT in a read-only transaction
[10:33:55.468](0.025s) ok 6 - read-only queries on standby 1
psql:<stdin>:1: ERROR: cannot execute INSERT in a read-only transaction
In 006_logical_decoding
[10:34:19.233](0.050s) ok 8 - pg_recvlogical acknowledged changes
psql:<stdin>:1: ERROR: replication slot "test_slot" was not created
in this database
[10:34:19.432](0.199s) ok 9 - replaying logical slot from another database fails
psql:<stdin>:1: ERROR: database "otherdb" is used by an active
logical replication slot
DETAIL: There is 1 active slot.
But I think it is a good idea to set the error message to variable to
avoid ERROR messages in the regress log file.
Added the change for the same.
12. ``` +# Restore connectivity between primary and standby +unlink($primary->data_dir . '/pg_hba.conf'); +$primary->append_conf( + 'pg_hba.conf', + qq{ +local all all trust +local replication all trust +}); ```Same as 10. Also, no need to do unlink() here.
I have used an injection point to simulate this scenario instead of
changing the contents of pg_hba.conf files.
13. ``` +# Create a new logical slot for testing injection point +$primary->safe_psql('postgres', + "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)" +); ```Before here can you add a description what you would test?
Added
14.
```
# Create a physical replication slot on primary for standby
$primary->psql('postgres',
q{SELECT pg_create_physical_replication_slot('sb1_slot');});
```
Use safe_psql instead of psql().
Fixed
15. ``` - if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) + if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true))) ```Is there a possibility that slot is not NULL? It is used when confirmed_flush exceeds
the latestFlushPtr, but in this case the function cannot reach here.
This change is not required. I have reverted this change.
Thanks,
Shlok Kyal
Attachments:
v5-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v5-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 30e93d695428789b173c7ee32ecd14fbebc5b68c Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v5 1/2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip to view pg_stat_replication_slots and new column
slot_sync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 ++--
doc/src/sgml/monitoring.sgml | 20 +++++++
doc/src/sgml/system-views.sgml | 8 +++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 61 +++++++++++++++++++-
src/backend/replication/slotfuncs.c | 26 ++++++++-
src/backend/replication/walreceiver.c | 3 +
src/backend/utils/activity/pgstat_replslot.c | 25 ++++++++
src/backend/utils/adt/pgstatfuncs.c | 18 ++++--
src/include/catalog/pg_proc.dat | 12 ++--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 18 ++++++
src/test/regress/expected/rules.out | 9 ++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 199 insertions(+), 22 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 28da9123cc8..933dc0f08af 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dc4fc29466d..2610befbad3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1655,6 +1655,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7971498fe75..23aa78649ee 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ Reason of the last slot synchronization skip.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 823776c1498..f8e87e216b6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slot_sync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1076,6 +1077,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..79fb16feae2 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -148,6 +149,24 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
+{
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
+
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -218,6 +237,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ update_slot_sync_skip_stats(slot, SS_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -261,6 +282,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin = remote_slot->catalog_xmin;
SpinLockRelease(&slot->mutex);
+ /* Synchronization happened, update the slot sync skip reason */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+
if (found_consistent_snapshot)
*found_consistent_snapshot = true;
}
@@ -277,6 +301,13 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
errdetail_internal("Remote slot has LSN %X/%08X but local slot has LSN %X/%08X.",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
LSN_FORMAT_ARGS(slot->data.confirmed_flush)));
+ else
+ {
+ /* Update the slot sync stats */
+ Assert(!found_consistent_snapshot ||
+ *found_consistent_snapshot);
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+ }
}
updated_xmin_or_lsn = true;
@@ -580,6 +611,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -595,11 +629,21 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /*
+ * If a consitent snapshot is not found, update the slot sync skip
+ * stats
+ */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -623,7 +667,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
static bool
synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
- ReplicationSlot *slot;
+ ReplicationSlot *slot = NULL;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
@@ -634,6 +678,19 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ update_slot_sync_skip_stats(slot, SS_SKIP_MISSING_WAL_RECORD);
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -939,6 +996,8 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+ INJECTION_POINT("slot-sync-skip", NULL);
+
return some_slot_updated;
}
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index b8f21153e7b..4e03205c63b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_MISSING_WAL_RECORD:
+ return "missing_wal_record";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slot_sync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 7361ffc9dcf..1788aa61b61 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -75,6 +75,7 @@
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/guc.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timestamp.h"
@@ -430,6 +431,8 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
/* Process any requests or signals received recently */
CHECK_FOR_INTERRUPTS();
+ INJECTION_POINT("walreceiver", NULL);
+
if (ConfigReloadPending)
{
ConfigReloadPending = false;
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index d210c261ac6..ddfbe97d87d 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -102,6 +102,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1fe33df2756..ba0904e3820 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2121,7 +2121,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2152,7 +2152,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2178,11 +2182,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(slotent->mem_exceeded_count);
values[8] = Int64GetDatum(slotent->total_txns);
values[9] = Int64GetDatum(slotent->total_bytes);
+ values[10] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[11] = true;
+ else
+ values[11] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
if (slotent->stat_reset_timestamp == 0)
- nulls[10] = true;
+ nulls[12] = true;
else
- values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b51d2b17379..de7be5fb252 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5691,9 +5691,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11511,9 +11511,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slot_sync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index bc8077cbae6..24f1c3ecc87 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -398,6 +398,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter mem_exceeded_count;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -741,6 +743,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index fe62162cde3..013f4bae942 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,21 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_MISSING_WAL_RECORD, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not build a consistent
+ * snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +264,9 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slot_sync_skip_reason;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 16753b2e4c0..5f6fc151d58 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slot_sync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slot_sync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2151,9 +2152,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5290b91e83e..8ac66b81f5f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2795,6 +2795,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
v5-0002-Add-test-for-new-stats-for-slot-sync-skip.patchapplication/octet-stream; name=v5-0002-Add-test-for-new-stats-for-slot-sync-skip.patchDownload
From 306f570d4eff2e1437bce387da867a94ad25ae7b Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 15 Oct 2025 15:36:34 +0530
Subject: [PATCH v5 2/2] Add test for new stats for slot sync skip
---
src/test/recovery/meson.build | 3 +-
src/test/recovery/t/049_slot_skip_stats.pl | 193 +++++++++++++++++++++
2 files changed, 195 insertions(+), 1 deletion(-)
create mode 100644 src/test/recovery/t/049_slot_skip_stats.pl
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 52993c32dbb..83a6c4b5c17 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -56,7 +56,8 @@ tests += {
't/045_archive_restartpoint.pl',
't/046_checkpoint_logical_slot.pl',
't/047_checkpoint_physical_slot.pl',
- 't/048_vacuum_horizon_floor.pl'
+ 't/048_vacuum_horizon_floor.pl',
+ 't/049_slot_skip_stats.pl'
],
},
}
diff --git a/src/test/recovery/t/049_slot_skip_stats.pl b/src/test/recovery/t/049_slot_skip_stats.pl
new file mode 100644
index 00000000000..74e49751348
--- /dev/null
+++ b/src/test/recovery/t/049_slot_skip_stats.pl
@@ -0,0 +1,193 @@
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(allows_streaming => 'logical');
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+});
+$primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+my $connstr = $primary->connstr;
+$standby1->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby1->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby1);
+
+# Initial sync of replication slots
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify that initially there is no skip reason
+my $result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+
+# Hold walreciever so WAL is not flushed on standby
+my $standby_psql = $standby1->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('walreceiver','wait')));
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+));
+
+# Wait for backend to reach injection point
+$standby1->wait_for_event('walreceiver', 'walreceiver');
+
+$primary->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());");
+
+my ($stdout, $stderr);
+# Attempt to sync replication slots while standby is behind
+($result, $stdout, $stderr) =
+ $standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Check skip reason and count when standby is behind
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby1->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+($result, $stdout, $stderr) =
+ $standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby1->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Detach injection point
+$standby1->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('walreceiver');
+ SELECT injection_points_wakeup('walreceiver');
+});
+
+# Check that skip reason is reset after successful sync
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'none', "slot_sync_skip_reason is reset after successful sync");
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby1);
+
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Test for case when slot sync is skipped when the remote slot is
+# behind the local slot.
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby1->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby1->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
+$result = $standby1->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby1->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+done_testing();
--
2.34.1
On Wed, 1 Oct 2025 at 08:26, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
Thanks for updating the patch. Here are my comments.
I found one more comment.
``` + /* + * If found_consistent_snapshot is not NULL and a consistent snapshot is + * found set the slot sync skip reason to none. Else, if consistent + * snapshot is not found the stats will be updated in the function + * update_and_persist_local_synced_slot + */ + if (!found_consistent_snapshot || *found_consistent_snapshot) + update_slot_sync_skip_stats(slot, SS_SKIP_NONE); ```I think the condition is confusing; in code level there is a path that
found_consistent_snapshot is NULL but synchronization happened. (Not sure it is
possible though).I think it is better to put update_slot_sync_skip_stats() near the sync part.
If the snapshot exists from the beginning, it can be done unconditionally,
otherwise we can check again. Attached .diffs file implements it.
I agree that the condition can be confusing. I checked your patch and
have a concern.
For the change:
```
LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
found_consistent_snapshot);
+ /* Update the slot sync skip reason if snapshot could be created */
+ if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
+ {
+ Assert(!found_consistent_snapshot ||
+ *found_consistent_snapshot);
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+ }
+
```
I debugged and noticed that, on a successful sync slot run, in some
cases we do not enter the if condition here which can lead to
misleading slot_sync_skip_reason.
I noticed that even if the synced slot is advanced and
'found_consistent_snapshot' set as true,
SnapBuildSnapshotExists(remote_slot->restart_lsn) can return false.
As far as I understand, LogicalSlotAdvanceAndCheckSnapState will
advance the slot but will not serialize the snapshot and hence
SnapBuildSnapshotExists can return false.
I have also added a test case in 049_slot_skip_stats named
"slot_sync_skip_reason is reset after successful sync" which can
reproduce it.
I think we can update the stats when "slot->data.confirmed_flush ==
remote_slot->confirmed_lsn" instead of checking
'SnapBuildSnapshotExists'. Thoughts?
Thanks,
Shlok Kyal
Dear Shlok,
01. ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + if (slot->slot_sync_skip_reason != skip_reason) + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ```Per my understanding, spinlock is acquired when the attribute on the shared
memory
is updating. Can you check other parts and follow the rukle?
02. ``` + SpinLockAcquire(&slot->mutex); + synced = slot->data.synced; + SpinLockRelease(&slot->mutex); ```Same as 1.
I checked and found the following comment:
* - Individual fields are protected by mutex where only the backend owning
* the slot is authorized to update the fields from its own slot. The
* backend owning the slot does not need to take this lock when reading its
* own fields, while concurrent backends not owning this slot should take the
* lock when reading this slot's data.So for the above two cases we are updating the
'slot->slot_sync_skip_reason' and reading 'slot->data.synced' and this
can happen before the slot sync worker acquires the slot or owns the
slot.
Also in the same code at a later stage we are again checking the
synced flag and we do that while holding a spin lock. Based on these
observations I think we should take Spinlock in both cases.
Hmm, regarding the update_slot_sync_skip_stats(), the replication slot has already been
acquired except synchronize_one_slot() case.
Can we avoid acquiring the spinlock as much as possible by adding an argument?
Or it just introduces additional complexity?
09.
```
+my $connstr_1 = $primary->connstr;
```Since this is an only connection string in the test, suffix _1 is not needed.
Fixed
Same as the comment, can you replace "standby1" to "stanby"?
10. ``` +# Simulate standby connection failure by modifying pg_hba.conf +unlink($primary->data_dir . '/pg_hba.conf'); +$primary->append_conf('pg_hba.conf', + qq{local all alltrust}
+);
```What if the system does not have Unix domain socket? I'm afraid all connections
could be brocked in this case.I have used an injection point to simulate this scenario instead of
changing the contents of pg_hba.conf files.
Can you clarify the reason why you used the injection point?
I'm not sure the injection point is beneficial here. I feel the point can be added
when we handle the timing issue, race condition etc, but walreceiver may not have
strong reasons to stop exact at that point.
Regarding the content of pg_hba.conf, I felt below lines might be enough:
```
local all all trust
host all all 127.0.0.1/32 trust
```
Also, here are comments for v5.
```
+ <para>
+ Reason of the last slot synchronization skip.
+ </para></entry>
```
Possible values must be clarified. This was posted in [1]/messages/by-id/OSCPR01MB14966A618A8C61EC3DEE486A4F517A@OSCPR01MB14966.jpnprd01.prod.outlook.com but seemed to be missed.
```
+ /* Update the slot sync reason */
```
It is better to clarify updating the *skip* reason
```
- ReplicationSlot *slot;
+ ReplicationSlot *slot = NULL;
```
No need to initialize as NULL.
```
+#include "utils/injection_point.h"
...
+ INJECTION_POINT("walreceiver", NULL);
```
As I told above, I have a concern to add the injection point. I want to hear
other's opinion as well.
```
+ else
+ {
+ /* Update the slot sync stats */
+ Assert(!found_consistent_snapshot ||
+ *found_consistent_snapshot);
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+ }
```
Your patch may have another issue; if both confirmed_flush_lsn are the same
but we do not have the consistent snapshot yet, we would get the assertion failure.
(Again, not sure it can really happen)
Can we use the condition as another if part? At that time we must clarify why
it is OK to pass in case of found_consistent_snapshot == NULL.
```
+# Attach injection point to simulate wait
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
```
I have been considering whether we can remove the injection point here or not.
I think the point is used because the being synchronized slot is still temporary
one; they would be cleaned up by ReplicationSlotCleanup().
Can we reproduce the skip event for the permanent slot? I cannot come up with,
but if possible no need to introduce the injection point.
[1]: /messages/by-id/OSCPR01MB14966A618A8C61EC3DEE486A4F517A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Mon, 20 Oct 2025 at 14:27, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
01. ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + if (slot->slot_sync_skip_reason != skip_reason) + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ```Per my understanding, spinlock is acquired when the attribute on the shared
memory
is updating. Can you check other parts and follow the rukle?
02. ``` + SpinLockAcquire(&slot->mutex); + synced = slot->data.synced; + SpinLockRelease(&slot->mutex); ```Same as 1.
I checked and found the following comment:
* - Individual fields are protected by mutex where only the backend owning
* the slot is authorized to update the fields from its own slot. The
* backend owning the slot does not need to take this lock when reading its
* own fields, while concurrent backends not owning this slot should take the
* lock when reading this slot's data.
I realised that the patch does not entirely follow the above rule. As
per my understanding rule says:
1. If we want to update a field of a slot we need to own the slot (or
we can say acquire the slot)
2. In the above case we also need to take a Spinlock on the slot to
update any field.
3. If we want to read a field and if we own the slot we do not need a
spinlock on the slot.
4. If we want to read a field and if we do not own the slot we need a
spinlock on the slot.
For the current patch (v5), in the function "synchronize_one_slot" we
are calling "update_slot_sync_skip_stats" even if we do not own the
slot.
So, as per the rule I have updated "update_slot_sync_skip_stats" to
own the slot before updating.
So for the above two cases we are updating the
'slot->slot_sync_skip_reason' and reading 'slot->data.synced' and this
can happen before the slot sync worker acquires the slot or owns the
slot.
Also in the same code at a later stage we are again checking the
synced flag and we do that while holding a spin lock. Based on these
observations I think we should take Spinlock in both cases.Hmm, regarding the update_slot_sync_skip_stats(), the replication slot has already been
acquired except synchronize_one_slot() case.
Can we avoid acquiring the spinlock as much as possible by adding an argument?
Or it just introduces additional complexity?
After updating the code as per rule, I think we always have to take a
Spinlock on the slot when we are updating any field.
09.
```
+my $connstr_1 = $primary->connstr;
```Since this is an only connection string in the test, suffix _1 is not needed.
Fixed
Same as the comment, can you replace "standby1" to "stanby"?
Fixed
10. ``` +# Simulate standby connection failure by modifying pg_hba.conf +unlink($primary->data_dir . '/pg_hba.conf'); +$primary->append_conf('pg_hba.conf', + qq{local all alltrust}
+);
```What if the system does not have Unix domain socket? I'm afraid all connections
could be brocked in this case.I have used an injection point to simulate this scenario instead of
changing the contents of pg_hba.conf files.Can you clarify the reason why you used the injection point?
I'm not sure the injection point is beneficial here. I feel the point can be added
when we handle the timing issue, race condition etc, but walreceiver may not have
strong reasons to stop exact at that point.Regarding the content of pg_hba.conf, I felt below lines might be enough:
```
local all all trust
host all all 127.0.0.1/32 trust
```
I checked this.
By default pg_hba.conf has contents as:
```
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
```
Now for our test to prevent the streaming replication we can set pg_hba.conf to
```
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
```
And then to restore streaming replication we can add following to pg_hba.conf:
```
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
```
I think this would be sufficient for our testing. Thoughts?
Also, here are comments for v5.
``` + <para> + Reason of the last slot synchronization skip. + </para></entry> ```Possible values must be clarified. This was posted in [1] but seemed to be missed.
Sorry, I missed it. I have updated it in the latest patch.
```
+ /* Update the slot sync reason */
```It is better to clarify updating the *skip* reason
Fixed
``` - ReplicationSlot *slot; + ReplicationSlot *slot = NULL; ```No need to initialize as NULL.
Fixed
``` +#include "utils/injection_point.h" ... + INJECTION_POINT("walreceiver", NULL); ```As I told above, I have a concern to add the injection point. I want to hear
other's opinion as well.
Removed it for now as per my analysis we can modify pg_hba.conf to
simulate the scenario.
``` + else + { + /* Update the slot sync stats */ + Assert(!found_consistent_snapshot || + *found_consistent_snapshot); + update_slot_sync_skip_stats(slot, SS_SKIP_NONE); + } ```Your patch may have another issue; if both confirmed_flush_lsn are the same
but we do not have the consistent snapshot yet, we would get the assertion failure.
(Again, not sure it can really happen)
Can we use the condition as another if part? At that time we must clarify why
it is OK to pass in case of found_consistent_snapshot == NULL.
Fixed
``` +# Attach injection point to simulate wait +$standby_psql->query_safe( + q(select injection_points_attach('slot-sync-skip','wait'))); ```I have been considering whether we can remove the injection point here or not.
I think the point is used because the being synchronized slot is still temporary
one; they would be cleaned up by ReplicationSlotCleanup().
Can we reproduce the skip event for the permanent slot? I cannot come up with,
but if possible no need to introduce the injection point.
I tried reproducing it but was not able to come up with a test without
injection point. Will further try to reproduce it without injection
point.
[1]: /messages/by-id/OSCPR01MB14966A618A8C61EC3DEE486A4F517A@OSCPR01MB14966.jpnprd01.prod.outlook.com
I have attached the latest patch.
Thanks,
Shlok Kyal
Attachments:
v6-0002-Add-test-for-new-stats-for-slot-sync-skip.patchapplication/octet-stream; name=v6-0002-Add-test-for-new-stats-for-slot-sync-skip.patchDownload
From c8b96465e8b58496fb861ee6451fed093625f750 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 15 Oct 2025 15:36:34 +0530
Subject: [PATCH v6 2/2] Add test for new stats for slot sync skip
---
src/test/recovery/meson.build | 3 +-
src/test/recovery/t/049_slot_skip_stats.pl | 199 +++++++++++++++++++++
2 files changed, 201 insertions(+), 1 deletion(-)
create mode 100644 src/test/recovery/t/049_slot_skip_stats.pl
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 52993c32dbb..83a6c4b5c17 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -56,7 +56,8 @@ tests += {
't/045_archive_restartpoint.pl',
't/046_checkpoint_logical_slot.pl',
't/047_checkpoint_physical_slot.pl',
- 't/048_vacuum_horizon_floor.pl'
+ 't/048_vacuum_horizon_floor.pl',
+ 't/049_slot_skip_stats.pl'
],
},
}
diff --git a/src/test/recovery/t/049_slot_skip_stats.pl b/src/test/recovery/t/049_slot_skip_stats.pl
new file mode 100644
index 00000000000..0f87960e94b
--- /dev/null
+++ b/src/test/recovery/t/049_slot_skip_stats.pl
@@ -0,0 +1,199 @@
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(allows_streaming => 'logical');
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+});
+$primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+my $connstr = $primary->connstr;
+$standby->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Initial sync of replication slots
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify that initially there is no skip reason
+my $result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+
+# Pause steaming replication connection so that standby can lag behind
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local all all trust
+host all all 127.0.0.1/32 trust
+host all all ::1/128 trust
+});
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+ SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());
+));
+
+my ($stdout, $stderr);
+# Attempt to sync replication slots while standby is behind
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Check skip reason and count when standby is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'missing_wal_record', "slot sync skip when standby is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore streaming replication connection
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+});
+$primary->restart;
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Check that skip reason is reset after successful sync
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'none', "slot_sync_skip_reason is reset after successful sync");
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby);
+
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Test for case when slot sync is skipped when the remote slot is
+# behind the local slot.
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+my $standby_psql = $standby->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slot_sync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slot_sync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+done_testing();
--
2.34.1
v6-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v6-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 840a414dab94d46164fe007be1b8ba1e1e23041d Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v6 1/2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip to view pg_stat_replication_slots and new column
slot_sync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 ++--
doc/src/sgml/monitoring.sgml | 20 ++++++
doc/src/sgml/system-views.sgml | 40 +++++++++++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 70 ++++++++++++++++++++
src/backend/replication/slotfuncs.c | 26 +++++++-
src/backend/utils/activity/pgstat_replslot.c | 25 +++++++
src/backend/utils/adt/pgstatfuncs.c | 18 +++--
src/include/catalog/pg_proc.dat | 12 ++--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 18 +++++
src/test/regress/expected/rules.out | 9 ++-
src/tools/pgindent/typedefs.list | 1 +
13 files changed, 238 insertions(+), 21 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 28da9123cc8..933dc0f08af 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slot_sync_skip_count | last_slot_sync_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+----------------------+---------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f3bf527d5b4..bfa2dbf38fc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1659,6 +1659,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slot_sync_skip</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7971498fe75..f903c75e421 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,46 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slot_sync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. This field is set only
+ for logical slots that are being synced from a primary server (that is,
+ those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>missing_wal_record</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the last slot
+ synchronization was skipped because the standby could not build a
+ consistent snapshot.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index dec8df4f8ee..877db2f87db 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slot_sync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1076,6 +1077,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index b122d99b009..47929eafcf7 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -148,6 +149,31 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason,
+ bool acquire_slot)
+{
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
+
+ if (acquire_slot)
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, true);
+
+ /* Update the slot sync skip reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+
+ if (acquire_slot)
+ ReplicationSlotRelease();
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -218,6 +244,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ update_slot_sync_skip_stats(slot, SS_SKIP_REMOTE_BEHIND, false);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -261,6 +289,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin = remote_slot->catalog_xmin;
SpinLockRelease(&slot->mutex);
+ /* Synchronization happened, update the slot sync skip reason */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE, false);
+
if (found_consistent_snapshot)
*found_consistent_snapshot = true;
}
@@ -277,6 +308,17 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
errdetail_internal("Remote slot has LSN %X/%08X but local slot has LSN %X/%08X.",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
LSN_FORMAT_ARGS(slot->data.confirmed_flush)));
+
+ /*
+ * If found_consistent_snapshot is not NULL, a true value means
+ * the slot synchronization was successful, while a false value
+ * means it was skipped (see
+ * update_and_persist_local_synced_slot()). If
+ * found_consistent_snapshot is NULL, no such check exists, so the
+ * stats can be updated directly.
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE, false);
}
updated_xmin_or_lsn = true;
@@ -580,6 +622,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -595,11 +640,21 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /*
+ * If a consitent snapshot is not found, update the slot sync skip
+ * stats
+ */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NO_CONSISTENT_SNAPSHOT, false);
+
return false;
}
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -634,6 +689,19 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ update_slot_sync_skip_stats(slot, SS_SKIP_MISSING_WAL_RECORD, true);
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -939,6 +1007,8 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+ INJECTION_POINT("slot-sync-skip", NULL);
+
return some_slot_updated;
}
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index b8f21153e7b..4e03205c63b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_MISSING_WAL_RECORD:
+ return "missing_wal_record";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slot_sync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index d210c261ac6..ddfbe97d87d 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -102,6 +102,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slot_sync_skip_count += 1;
+ statent->last_slot_sync_skip = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a710508979e..59de7c0355e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2129,7 +2129,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2160,7 +2160,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slot_sync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "last_slot_sync_skip",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2186,11 +2190,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(slotent->mem_exceeded_count);
values[8] = Int64GetDatum(slotent->total_txns);
values[9] = Int64GetDatum(slotent->total_bytes);
+ values[10] = Int64GetDatum(slotent->slot_sync_skip_count);
+
+ if (slotent->last_slot_sync_skip == 0)
+ nulls[11] = true;
+ else
+ values[11] = TimestampTzGetDatum(slotent->last_slot_sync_skip);
if (slotent->stat_reset_timestamp == 0)
- nulls[10] = true;
+ nulls[12] = true;
else
- values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9121a382f76..577d90e4e0e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5691,9 +5691,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slot_sync_skip_count,last_slot_sync_skip,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11511,9 +11511,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slot_sync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 7ae503e71a2..6d75e47f9e5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -398,6 +398,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter mem_exceeded_count;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slot_sync_skip_count;
+ TimestampTz last_slot_sync_skip;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -742,6 +744,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..9d7aabfb894 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,21 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_MISSING_WAL_RECORD, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not build a consistent
+ * snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +264,9 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slot_sync_skip_reason;
+
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 77e25ca029e..fcbcf45d80f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slot_sync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slot_sync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2151,9 +2152,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slot_sync_skip_count, last_slot_sync_skip, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 018b5919cf6..ed3298fe7ca 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2797,6 +2797,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
Dear Shlok,
Thanks for updating the patch. Few comments:
```
The reason for the last slot synchronization skip. This field is set only
for logical slots that are being synced from a primary server (that is,
those whose <structfield>synced</structfield> field is
<literal>true</literal>).
```
What happens if the slot has a skip reason and the standby is promoted?
Will the attribute be retained? If so, do we have to add some notes like "sync"?
```
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason,
+ bool acquire_slot)
```
Let's follow existing codes; ReplicationSlotSetInactiveSince(), third argument
can be `acquire_lock`.
```
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
```
Is it OK to call pgstat_report_replslot_sync_skip() without any locks?
```
ReplicationSlotAcquire(NameStr(slot->data.name), true, true);
```
Can you clarify the reason error_if_invalid=true? Other codes in the file use
error_if_invalid=false.
```
+ /* Update the slot sync skip reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
```
Now the replication slot can be always acquired. Do we still have to acquire the
spinlock even for reading the value? In other words, can we move SpinLockAcquire()
and SpinLockRelease() into inside the if block?
```
# Copyright (c) 2024-2025, PostgreSQL Global Development Group
```
I think 2024 can be removed.
```
my $primary = PostgreSQL::Test::Cluster->new('publisher');
```
s/publisher/primary/.
```
# Pause steaming replication connection so that standby can lag behind
unlink($primary->data_dir . '/pg_hba.conf');
$primary->append_conf(
'pg_hba.conf', qq{
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
});
$primary->restart;
```
Not sure it can be called like "Pause". how about like:
```
Update pg_hba.conf and restart primar to reject streaming replication connections.
WAL records won't be replicated to the standby until .conf is restored.
```
```
# Attempt to sync replication slots while standby is behind
($result, $stdout, $stderr) =
$standby->psql('postgres', "SELECT pg_sync_replication_slots();");
```
Can you verify the $stderr that synchornization was failed? I cannot find other
tests which checks the message. It is enough to do once.
```
$result = $standby->safe_psql(
'postgres',
"SELECT slot_sync_skip_reason FROM pg_replication_slots
WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
);
is($result, 'missing_wal_record', "slot sync skip when standby is behind");
```
I found the test does twice; can we remove second one?
```
# Cleanup: drop the logical slot and ensure standby catches up
$primary->safe_psql('postgres',
"SELECT pg_drop_replication_slot('slot_sync')");
$primary->wait_for_replay_catchup($standby);
$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
# Test for case when slot sync is skipped when the remote slot is
# behind the local slot.
$primary->safe_psql('postgres',
"SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
);
```
Can we use reset function instead of dropping it?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
I have attached the latest patch.
Thanks. I have started going through it.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.
thanks
Shveta
On Mon, Nov 3, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
I have attached the latest patch.
Thanks. I have started going through it.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.
Please find a few more comments:
1)
last_slot_sync_skip
Will 'last_slotsync_skip_at' be a better name?
Since we refer worker as slotsync in docs, I feel slotsync seems a
more natural choice than slot_sync. Also '_at' gives clarity that it
is about time rather than a boolean (which currently it seems like).
Same goes for slot_sync_skip_count and slot_sync_skip_reason. Shall
these be slotsync_skip_count and slotsync_skip_reason.
2)
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason
skip_reason,
+ bool acquire_slot)
It looks like there is only one caller that passes acquire_slot as
true, while all others pass false. Instead of keeping the acquire_slot
parameter, will it be better if we remove it and add an
Assert(MyReplication) to ensure the slot is already acquired? We can
add a comment stating that this function expects the slot to be
acquired by the caller. The one caller that currently passes
acquire_slot = true can acquire the slot explicitly before invoking
this function. Thoughts?
3)
In update_and_persist_local_synced_slot(), we get both
'found_consistent_snapshot' and 'remote_slot_precedes' from
update_local_synced_slot(). But skipsync-reason for
'remote_slot_precedes' is updated inside update_local_synced_slot()
while skipsync-reason for '!found_consistent_snapshot' is updated in
caller update_and_persist_local_synced_slot. Is there a reason for
that?
4)
What about the case where the slot is invalidated and sync is skipped?
I do not see any stats for that. See 'Skip the sync of an invalidated
slot' in synchronize_one_slot(). If it is already discussed and
concluded, please add a comment.
thanks
Shveta
On Tue, Nov 4, 2025 at 3:17 PM shveta malik <shveta.malik@gmail.com> wrote:
On Mon, Nov 3, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
I have attached the latest patch.
Thanks. I have started going through it.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Please find a few more comments:
1)
last_slot_sync_skipWill 'last_slotsync_skip_at' be a better name?
Since we refer worker as slotsync in docs, I feel slotsync seems a
more natural choice than slot_sync. Also '_at' gives clarity that it
is about time rather than a boolean (which currently it seems like).Same goes for slot_sync_skip_count and slot_sync_skip_reason. Shall
these be slotsync_skip_count and slotsync_skip_reason.2) +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason, + bool acquire_slot)It looks like there is only one caller that passes acquire_slot as
true, while all others pass false. Instead of keeping the acquire_slot
parameter, will it be better if we remove it and add an
Assert(MyReplication) to ensure the slot is already acquired? We can
add a comment stating that this function expects the slot to be
acquired by the caller. The one caller that currently passes
acquire_slot = true can acquire the slot explicitly before invoking
this function. Thoughts?3)
In update_and_persist_local_synced_slot(), we get both
'found_consistent_snapshot' and 'remote_slot_precedes' from
update_local_synced_slot(). But skipsync-reason for
'remote_slot_precedes' is updated inside update_local_synced_slot()
while skipsync-reason for '!found_consistent_snapshot' is updated in
caller update_and_persist_local_synced_slot. Is there a reason for
that?4)
What about the case where the slot is invalidated and sync is skipped?
I do not see any stats for that. See 'Skip the sync of an invalidated
slot' in synchronize_one_slot(). If it is already discussed and
concluded, please add a comment.
Few more on 001:
5)
The name SS_SKIP_MISSING_WAL_RECORD doesn’t seem appropriate. It
sounds more like some WAL issue, rather than indicating that the WAL
hasn’t been flushed. A better name could be SS_SKIP_WAL_NOT_FLUSHED.
6)
Instead of calling 'update_slot_sync_skip_stats' at multiple places,
how about we just update the skip_reason everywhere and make a call to
'update_slot_sync_skip_stats' only in synchronize_one_slot(). IMO,
that will look cleaner. Thoughts?
thanks
Shveta
On Wed, 5 Nov 2025 at 11:49, shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Nov 4, 2025 at 3:17 PM shveta malik <shveta.malik@gmail.com> wrote:
On Mon, Nov 3, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
I have attached the latest patch.
Thanks. I have started going through it.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.
Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.
Please find a few more comments:
1)
last_slot_sync_skipWill 'last_slotsync_skip_at' be a better name?
Since we refer worker as slotsync in docs, I feel slotsync seems a
more natural choice than slot_sync. Also '_at' gives clarity that it
is about time rather than a boolean (which currently it seems like).Same goes for slot_sync_skip_count and slot_sync_skip_reason. Shall
these be slotsync_skip_count and slotsync_skip_reason.
Fixed it.
2) +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason, + bool acquire_slot)It looks like there is only one caller that passes acquire_slot as
true, while all others pass false. Instead of keeping the acquire_slot
parameter, will it be better if we remove it and add an
Assert(MyReplication) to ensure the slot is already acquired? We can
add a comment stating that this function expects the slot to be
acquired by the caller. The one caller that currently passes
acquire_slot = true can acquire the slot explicitly before invoking
this function. Thoughts?
This idea looks good to me. I have updated the patch accordingly.
3)
In update_and_persist_local_synced_slot(), we get both
'found_consistent_snapshot' and 'remote_slot_precedes' from
update_local_synced_slot(). But skipsync-reason for
'remote_slot_precedes' is updated inside update_local_synced_slot()
while skipsync-reason for '!found_consistent_snapshot' is updated in
caller update_and_persist_local_synced_slot. Is there a reason for
that?
update_and_persist_local_synced_slot is called when the synced slot
is in temporary state and we are calling function
'update_local_synced_slot' directly for permanent slots.Slot sync
skip when "remote_slot_precedes" is true can happen for both permanent
and temporary slot. So I think we need to update the stats in
"update_local_synced_slot"
Whereas we are skipping slot sync only for temporary slots when a
consistent snapshot is not found. So I added this in the function
"update_and_persist_local_synced_slot".
4)
What about the case where the slot is invalidated and sync is skipped?
I do not see any stats for that. See 'Skip the sync of an invalidated
slot' in synchronize_one_slot(). If it is already discussed and
concluded, please add a comment.
It was not discussed earlier.
The pg_replication_slots already have a column name
'invalidation_reason'. And when the remote slot is invalidated the
local slot's is also invalidated. So, should we be required to
maintain this in 'slotsync_skip_reason' as well? I think it would be
kind of redundant? Thoughts?
Few more on 001:
5)
The name SS_SKIP_MISSING_WAL_RECORD doesn’t seem appropriate. It
sounds more like some WAL issue, rather than indicating that the WAL
hasn’t been flushed. A better name could be SS_SKIP_WAL_NOT_FLUSHED.
I think that the suggested name is better. I have updated the patch accordingly.
6)
Instead of calling 'update_slot_sync_skip_stats' at multiple places,
how about we just update the skip_reason everywhere and make a call to
'update_slot_sync_skip_stats' only in synchronize_one_slot(). IMO,
that will look cleaner. Thoughts?
I tried this approach in [1]/messages/by-id/CANhcyEXHcdoRRo0N0uib-t7mfkbotv=aYjAWAekDAbHCRe+Bng@mail.gmail.com (see v2_approach2). This approach would
require passing extra parameters to the functions.
Here Amit suggested that we should try an approach where this can be
avoided. So, I came up with the current approach. See [2]/messages/by-id/CAA4eK1KZLPxv7VBZf=Bp9=-pzKNfvNmFDqFYYzwkowE4FpRs1A@mail.gmail.com.
I have addressed the comments and attached the updated v7 patch.
[1]: /messages/by-id/CANhcyEXHcdoRRo0N0uib-t7mfkbotv=aYjAWAekDAbHCRe+Bng@mail.gmail.com
[2]: /messages/by-id/CAA4eK1KZLPxv7VBZf=Bp9=-pzKNfvNmFDqFYYzwkowE4FpRs1A@mail.gmail.com
Thanks,
Shlok Kyal
Attachments:
v7-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v7-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 9d6b7d7ec4a9d4abfe0b3153f02921946fdbd63a Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v7 1/2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slot_sync_skip_count,
last_slot_sync_skip to view pg_stat_replication_slots and new column
slot_sync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 +--
doc/src/sgml/monitoring.sgml | 25 +++++++
doc/src/sgml/system-views.sgml | 43 +++++++++++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 77 ++++++++++++++++++++
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 26 ++++++-
src/backend/utils/activity/pgstat_replslot.c | 25 +++++++
src/backend/utils/adt/pgstatfuncs.c | 18 ++++-
src/include/catalog/pg_proc.dat | 12 +--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 19 ++++-
src/test/regress/expected/rules.out | 9 ++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 254 insertions(+), 22 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 28da9123cc8..e5117f88a14 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2741c138593..5332328e07b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1659,6 +1659,31 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped. The value of this
+ column has no meaning on the primary server; it defaults to 0 for all
+ slots, but may (if leftover from a promoted standby) also have a
+ positive value.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_at</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped. The value of this
+ column has no meaning on the primary server; it defaults to NULL for all
+ slots, but may (if leftover from a promotedstandby) contain a timestamp.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7971498fe75..08ed9d609e1 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,49 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. This field is set only
+ for logical slots that are being synchronized from a primary server (that
+ is, those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). The value of this column has no meaning on the
+ primary server; it defaults to <literal>none</literal> for all slots, but
+ may (if leftover from a promoted standby) also have a value other than
+ <literal>none</literal>. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the last slot
+ synchronization was skipped because the standby could not build a
+ consistent snapshot.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 059e8778ca7..56f754c9973 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1076,6 +1077,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.slotsync_skip_at,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..fef00d0805d 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -148,6 +149,31 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
+{
+ Assert(MyReplicationSlot);
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->data.slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->data.slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -218,6 +244,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ update_slot_sync_skip_stats(slot, SS_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -261,6 +289,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin = remote_slot->catalog_xmin;
SpinLockRelease(&slot->mutex);
+ /* Synchronization happened, update the slot sync skip reason */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+
if (found_consistent_snapshot)
*found_consistent_snapshot = true;
}
@@ -277,6 +308,17 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
errdetail_internal("Remote slot has LSN %X/%08X but local slot has LSN %X/%08X.",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
LSN_FORMAT_ARGS(slot->data.confirmed_flush)));
+
+ /*
+ * If found_consistent_snapshot is not NULL, a true value means
+ * the slot synchronization was successful, while a false value
+ * means it was skipped (see
+ * update_and_persist_local_synced_slot()). If
+ * found_consistent_snapshot is NULL, no such check exists, so the
+ * stats can be updated directly.
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
}
updated_xmin_or_lsn = true;
@@ -580,6 +622,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -595,11 +640,21 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /*
+ * If a consitent snapshot is not found, update the slot sync skip
+ * stats
+ */
+ update_slot_sync_skip_stats(slot, SS_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -634,6 +689,26 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ update_slot_sync_skip_stats(slot, SS_SKIP_WAL_NOT_FLUSHED);
+
+ ReplicationSlotRelease();
+ }
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -939,6 +1014,8 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+ INJECTION_POINT("slot-sync-skip", NULL);
+
return some_slot_updated;
}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..0da07b2873e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -478,6 +478,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->data.two_phase_at = InvalidXLogRecPtr;
slot->data.failover = failover;
slot->data.synced = synced;
+ slot->data.slotsync_skip_reason = SS_SKIP_NONE;
/* and then data only present in shared memory */
slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..109185ce5a2 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_WAL_NOT_FLUSHED:
+ return "wal_not_flushed";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.data.slotsync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index d210c261ac6..89865d615c8 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -102,6 +102,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslot_sync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slotsync_skip_count += 1;
+ statent->slotsync_skip_at = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1521d6e2ab4..580a238fccd 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2129,7 +2129,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2160,7 +2160,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slotsync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slotsync_skip_at",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2186,11 +2190,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(slotent->mem_exceeded_count);
values[8] = Int64GetDatum(slotent->total_txns);
values[9] = Int64GetDatum(slotent->total_bytes);
+ values[10] = Int64GetDatum(slotent->slotsync_skip_count);
+
+ if (slotent->slotsync_skip_at == 0)
+ nulls[11] = true;
+ else
+ values[11] = TimestampTzGetDatum(slotent->slotsync_skip_at);
if (slotent->stat_reset_timestamp == 0)
- nulls[10] = true;
+ nulls[12] = true;
else
- values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5cf9e12fcb9..0e209cc43e1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5691,9 +5691,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slotsync_skip_count,slotsync_skip_at,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11511,9 +11511,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a0610bb3e31..e9205700aa3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -400,6 +400,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter mem_exceeded_count;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slotsync_skip_count;
+ TimestampTz slotsync_skip_at;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -745,6 +747,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_sync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..33a0ad38163 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,21 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not build a consistent
+ * snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -141,6 +156,9 @@ typedef struct ReplicationSlotPersistentData
* for logical slots on the primary server.
*/
bool failover;
+
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlotPersistentData;
/*
@@ -248,7 +266,6 @@ typedef struct ReplicationSlot
* position.
*/
XLogRecPtr last_saved_restart_lsn;
-
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7c52181cbcb..6e6425d01fe 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2151,9 +2152,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.slotsync_skip_at,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slotsync_skip_count, slotsync_skip_at, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432509277c9..17f1446cfd4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2799,6 +2799,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
v7-0002-Add-test-for-new-stats-for-slot-sync-skip.patchapplication/octet-stream; name=v7-0002-Add-test-for-new-stats-for-slot-sync-skip.patchDownload
From b20f5558c26e9e9f2836d4d7792415c542ed9840 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 12 Nov 2025 15:13:17 +0530
Subject: [PATCH v7 2/2] Add test for new stats for slot sync skip
---
src/test/recovery/meson.build | 1 +
.../recovery/t/050_slotsync_skip_stats.pl | 198 ++++++++++++++++++
2 files changed, 199 insertions(+)
create mode 100644 src/test/recovery/t/050_slotsync_skip_stats.pl
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 523a5cd5b52..17551cf114a 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -58,6 +58,7 @@ tests += {
't/047_checkpoint_physical_slot.pl',
't/048_vacuum_horizon_floor.pl',
't/049_wait_for_lsn.pl',
+ 't/050_slotsync_skip_stats.pl',
],
},
}
diff --git a/src/test/recovery/t/050_slotsync_skip_stats.pl b/src/test/recovery/t/050_slotsync_skip_stats.pl
new file mode 100644
index 00000000000..3f9d235c2b6
--- /dev/null
+++ b/src/test/recovery/t/050_slotsync_skip_stats.pl
@@ -0,0 +1,198 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 'logical');
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+});
+$primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+my $connstr = $primary->connstr;
+$standby->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Initial sync of replication slots
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify that initially there is no skip reason
+my $result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+
+# Update pg_hba.conf and restart the primary to reject streaming replication
+# connections. WAL records won't be replicated to the standby until the
+# configuration is restored.
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local all all trust
+host all all 127.0.0.1/32 trust
+host all all ::1/128 trust
+});
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+ SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());
+));
+
+my ($stdout, $stderr);
+# Attempt to sync replication slots while standby is behind
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify pg_sync_replication_slots is failing
+ok( $stderr =~ /skipping slot synchronization because the received slot sync/,
+ 'pg_sync_replication_slots failed as expected');
+
+# Check skip reason and count when standby is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'wal_not_flushed', "slot sync skip when standby is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore streaming replication connection
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+});
+$primary->restart;
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Check that skip reason is reset after successful sync
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'none', "slotsync_skip_reason is reset after successful sync");
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby);
+
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Test for case when slot sync is skipped when the remote slot is
+# behind the local slot.
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+my $standby_psql = $standby->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+done_testing();
--
2.34.1
On Fri, 31 Oct 2025 at 11:30, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
Thanks for updating the patch. Few comments:
```
The reason for the last slot synchronization skip. This field is set only
for logical slots that are being synced from a primary server (that is,
those whose <structfield>synced</structfield> field is
<literal>true</literal>).
```What happens if the slot has a skip reason and the standby is promoted?
Will the attribute be retained? If so, do we have to add some notes like "sync"?
I tested it and I agree that slot sync skip stats will be retained. I
have updated the docs accordingly
``` +/* Update slot sync skip stats */ +static void +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason, + bool acquire_slot) ```Let's follow existing codes; ReplicationSlotSetInactiveSince(), third argument
can be `acquire_lock`.
I have updated it according to latest comment by Shveta in [1]/messages/by-id/CANhcyEXEyUt8TycOqkgdWT+NeJASM=1acU1dK64UNsJeKMQFQA@mail.gmail.com
``` + /* + * Update the slot sync related stats in pg_stat_replication_slot when a + * slot sync is skipped + */ + if (skip_reason != SS_SKIP_NONE) + pgstat_report_replslot_sync_skip(slot); ```Is it OK to call pgstat_report_replslot_sync_skip() without any locks?
I think we need to acquire the slot before the call to
pgstat_report_replslot_sync_skip. This ensures
'pgstat_acquire_replslot' is called and the stats entry for the slot
is already present when we are updating the slot. Also in a similar
function 'pgstat_report_replslot', the comments says we should ensure
that pgstat_create_replslot() or pgstat_acquire_replslot() is called
before updating the stats.
```
ReplicationSlotAcquire(NameStr(slot->data.name), true, true);
```Can you clarify the reason error_if_invalid=true? Other codes in the file use
error_if_invalid=false.
I agree we should keep error_if_invalid=false because if
error_if_invalid=true and we try to acquire an invalidated slot an
error is thrown and it will exit the slotsync worker.
``` + /* Update the slot sync skip reason */ + SpinLockAcquire(&slot->mutex); + if (slot->slot_sync_skip_reason != skip_reason) + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ```Now the replication slot can be always acquired. Do we still have to acquire the
spinlock even for reading the value? In other words, can we move SpinLockAcquire()
and SpinLockRelease() into inside the if block?
I agree that we use SpinLockAcquire() and SpinLockRelease() inside the
if block. I have fixed it.
```
# Copyright (c) 2024-2025, PostgreSQL Global Development Group
```I think 2024 can be removed.
Fixed
```
my $primary = PostgreSQL::Test::Cluster->new('publisher');
```s/publisher/primary/.
Fixed
```
# Pause steaming replication connection so that standby can lag behind
unlink($primary->data_dir . '/pg_hba.conf');
$primary->append_conf(
'pg_hba.conf', qq{
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
});
$primary->restart;
```Not sure it can be called like "Pause". how about like:
```
Update pg_hba.conf and restart primar to reject streaming replication connections.
WAL records won't be replicated to the standby until .conf is restored.
```
Fixed
```
# Attempt to sync replication slots while standby is behind
($result, $stdout, $stderr) =
$standby->psql('postgres', "SELECT pg_sync_replication_slots();");
```Can you verify the $stderr that synchornization was failed? I cannot find other
tests which checks the message. It is enough to do once.
Added
```
$result = $standby->safe_psql(
'postgres',
"SELECT slot_sync_skip_reason FROM pg_replication_slots
WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
);
is($result, 'missing_wal_record', "slot sync skip when standby is behind");
```I found the test does twice; can we remove second one?
Fixed
```
# Cleanup: drop the logical slot and ensure standby catches up
$primary->safe_psql('postgres',
"SELECT pg_drop_replication_slot('slot_sync')");
$primary->wait_for_replay_catchup($standby);$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
# Test for case when slot sync is skipped when the remote slot is
# behind the local slot.
$primary->safe_psql('postgres',
"SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
);
```Can we use reset function instead of dropping it?
Here the test checks 'remote_behind' case for first slot sync cycle
(when slot is in temporary slot). To achieve that we need to recreate
the slot.
I have addressed the comment and attached the updated patch in [1]/messages/by-id/CANhcyEXEyUt8TycOqkgdWT+NeJASM=1acU1dK64UNsJeKMQFQA@mail.gmail.com.
[1]: /messages/by-id/CANhcyEXEyUt8TycOqkgdWT+NeJASM=1acU1dK64UNsJeKMQFQA@mail.gmail.com
Thanks,
Shlok Kyal
Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.
Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?
If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.
```
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
+{
+ Assert(MyReplicationSlot);
```
I think no need to require *slot as an argument. We can use the variable to shorten
like update_local_synced_slot().
```
# Verify pg_sync_replication_slots is failing
ok( $stderr =~ /skipping slot synchronization because the received slot sync/,
'pg_sync_replication_slots failed as expected');
```
This may be matter of taste, but can you check whole of log message? Latter part
indicates the actual reason.
```
# Detach injection point
$standby->safe_psql(
'postgres', q{
SELECT injection_points_detach('slot-sync-skip');
SELECT injection_points_wakeup('slot-sync-skip');
});
```
Not mandatory, but you can quit the background session if you release the
injection point.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.
I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.
``` +static void +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason) +{ + Assert(MyReplicationSlot); ```I think no need to require *slot as an argument. We can use the variable to shorten
like update_local_synced_slot().
Fixed
```
# Verify pg_sync_replication_slots is failing
ok( $stderr =~ /skipping slot synchronization because the received slot sync/,
'pg_sync_replication_slots failed as expected');
```This may be matter of taste, but can you check whole of log message? Latter part
indicates the actual reason.
The latter part of the message contains LSN values, which are not
stable across runs. To avoid hard-coding specific LSNs, I matched the
fixed, non-variable parts of the message while still covering the
reason for the failure.
```
# Detach injection point
$standby->safe_psql(
'postgres', q{
SELECT injection_points_detach('slot-sync-skip');
SELECT injection_points_wakeup('slot-sync-skip');
});
```Not mandatory, but you can quit the background session if you release the
injection point.
Fixed
Apart from the above changes. I have renamed the functions to
consistently use the term 'slotsync' instead of 'slot_sync'
update_slot_sync_skip_stats -> update_slotsync_skip_stats
pgstat_report_replslot_sync_skip -> pgstat_report_replslotsync_skip
I have attached the updated v8 patch with the latest changes.
Thanks,
Shlok Kyal
Attachments:
v8-0001-Add-stats-related-to-slot-sync-skip.patchapplication/octet-stream; name=v8-0001-Add-stats-related-to-slot-sync-skip.patchDownload
From 3041fdff5a05f918807a8f2154d56cbb616a8d7d Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Mon, 8 Sep 2025 21:04:42 +0530
Subject: [PATCH v8 1/2] Add stats related to slot sync skip
When slot sync is performed, it can happen that it is skipped due to
various reason. This patch adds stats for synced slots regarding this
slot sync skip. This patch adds new columns slotsync_skip_count,
last_slotsync_skip_at to view pg_stat_replication_slots and new column
slotsync_skip_reason to view pg_replication_slots.
---
contrib/test_decoding/expected/stats.out | 12 +--
doc/src/sgml/monitoring.sgml | 25 ++++++
doc/src/sgml/system-views.sgml | 43 +++++++++++
src/backend/catalog/system_views.sql | 5 +-
src/backend/replication/logical/slotsync.c | 81 ++++++++++++++++++++
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 26 ++++++-
src/backend/utils/activity/pgstat_replslot.c | 25 ++++++
src/backend/utils/adt/pgstatfuncs.c | 18 ++++-
src/include/catalog/pg_proc.dat | 12 +--
src/include/pgstat.h | 3 +
src/include/replication/slot.h | 17 ++++
src/test/regress/expected/rules.out | 9 ++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 257 insertions(+), 21 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 28da9123cc8..7426c48040a 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | last_slotsync_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | last_slotsync_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 436ef0e8bd0..894a972b3c1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1659,6 +1659,31 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped. The value of this
+ column has no meaning on the primary server; it defaults to 0 for all
+ slots, but may (if leftover from a promoted standby) also have a
+ positive value.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_slotsync_skip_at</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped. The value of this
+ column has no meaning on the primary server; it defaults to NULL for all
+ slots, but may (if leftover from a promotedstandby) contain a timestamp.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7971498fe75..08ed9d609e1 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,49 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. This field is set only
+ for logical slots that are being synchronized from a primary server (that
+ is, those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). The value of this column has no meaning on the
+ primary server; it defaults to <literal>none</literal> for all slots, but
+ may (if leftover from a promoted standby) also have a value other than
+ <literal>none</literal>. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the last slot
+ synchronization was skipped because the standby could not build a
+ consistent snapshot.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 95ad29a64b9..4257e7db9be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
@@ -1076,6 +1077,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.last_slotsync_skip_at,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..b7d8b991d71 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -148,6 +149,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync_skip(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -218,6 +248,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -261,6 +293,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin = remote_slot->catalog_xmin;
SpinLockRelease(&slot->mutex);
+ /* Synchronization happened, update the slot sync skip reason */
+ update_slotsync_skip_stats(SS_SKIP_NONE);
+
if (found_consistent_snapshot)
*found_consistent_snapshot = true;
}
@@ -277,6 +312,17 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
errdetail_internal("Remote slot has LSN %X/%08X but local slot has LSN %X/%08X.",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
LSN_FORMAT_ARGS(slot->data.confirmed_flush)));
+
+ /*
+ * If found_consistent_snapshot is not NULL, a true value means
+ * the slot synchronization was successful, while a false value
+ * means it was skipped (see
+ * update_and_persist_local_synced_slot()). If
+ * found_consistent_snapshot is NULL, no such check exists, so the
+ * stats can be updated directly.
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slotsync_skip_stats(SS_SKIP_NONE);
}
updated_xmin_or_lsn = true;
@@ -580,6 +626,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -595,11 +644,21 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
+ /*
+ * If a consitent snapshot is not found, update the slot sync skip
+ * stats
+ */
+ update_slotsync_skip_stats(SS_SKIP_NO_CONSISTENT_SNAPSHOT);
+
return false;
}
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -634,6 +693,26 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
+
+ ReplicationSlotRelease();
+ }
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -939,6 +1018,8 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+ INJECTION_POINT("slot-sync-skip", NULL);
+
return some_slot_updated;
}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..66b989027a5 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,28 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_WAL_NOT_FLUSHED:
+ return "wal_not_flushed";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +257,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +465,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slotsync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index d210c261ac6..10cf97bd29a 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -102,6 +102,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslotsync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slotsync_skip_count += 1;
+ statent->last_slotsync_skip_at = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3d98d064a94..607aa39fd89 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2129,7 +2129,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2160,7 +2160,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slotsync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "last_slotsync_skip_at",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2186,11 +2190,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(slotent->mem_exceeded_count);
values[8] = Int64GetDatum(slotent->total_txns);
values[9] = Int64GetDatum(slotent->total_bytes);
+ values[10] = Int64GetDatum(slotent->slotsync_skip_count);
+
+ if (slotent->last_slotsync_skip_at == 0)
+ nulls[11] = true;
+ else
+ values[11] = TimestampTzGetDatum(slotent->last_slotsync_skip_at);
if (slotent->stat_reset_timestamp == 0)
- nulls[10] = true;
+ nulls[12] = true;
else
- values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index aaadfd8c748..e1dc1d2e8b1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5691,9 +5691,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slotsync_skip_count,last_slotsync_skip_at,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
@@ -11511,9 +11511,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a68e725259a..1cb96e6d0a6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -400,6 +400,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter mem_exceeded_count;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slotsync_skip_count;
+ TimestampTz last_slotsync_skip_at;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -745,6 +747,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslotsync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..aebc40f486d 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,21 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not build a consistent
+ * snapshot */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +264,8 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /* The reason for last slot sync skip */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 372a2188c22..872116cb02a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
@@ -2151,9 +2152,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.last_slotsync_skip_at,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slotsync_skip_count, last_slotsync_skip_at, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..eb19e2c2363 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2799,6 +2799,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
v8-0002-Add-test-for-new-stats-for-slot-sync-skip.patchapplication/octet-stream; name=v8-0002-Add-test-for-new-stats-for-slot-sync-skip.patchDownload
From bd9df395c639b478df5e62888a01d29eace3e221 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Tue, 18 Nov 2025 15:42:10 +0530
Subject: [PATCH v8 2/2] Add test for new stats for slot sync skip
---
src/test/recovery/meson.build | 1 +
.../recovery/t/050_slotsync_skip_stats.pl | 201 ++++++++++++++++++
2 files changed, 202 insertions(+)
create mode 100644 src/test/recovery/t/050_slotsync_skip_stats.pl
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 523a5cd5b52..17551cf114a 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -58,6 +58,7 @@ tests += {
't/047_checkpoint_physical_slot.pl',
't/048_vacuum_horizon_floor.pl',
't/049_wait_for_lsn.pl',
+ 't/050_slotsync_skip_stats.pl',
],
},
}
diff --git a/src/test/recovery/t/050_slotsync_skip_stats.pl b/src/test/recovery/t/050_slotsync_skip_stats.pl
new file mode 100644
index 00000000000..9313b9cb45b
--- /dev/null
+++ b/src/test/recovery/t/050_slotsync_skip_stats.pl
@@ -0,0 +1,201 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 'logical');
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+});
+$primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+my $connstr = $primary->connstr;
+$standby->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Initial sync of replication slots
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify that initially there is no skip reason
+my $result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+
+# Update pg_hba.conf and restart the primary to reject streaming replication
+# connections. WAL records won't be replicated to the standby until the
+# configuration is restored.
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local all all trust
+host all all 127.0.0.1/32 trust
+host all all ::1/128 trust
+});
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+ SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());
+));
+
+my ($stdout, $stderr);
+# Attempt to sync replication slots while standby is behind
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify pg_sync_replication_slots is failing
+ok( $stderr =~
+ qr/skipping slot synchronization because the received slot sync.*is ahead of the standby position/,
+ 'pg_sync_replication_slots failed as expected');
+
+# Check skip reason and count when standby is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'wal_not_flushed', "slot sync skip when standby is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore streaming replication connection
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+});
+$primary->restart;
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Check that skip reason is reset after successful sync
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'none', "slotsync_skip_reason is reset after successful sync");
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby);
+
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Test for case when slot sync is skipped when the remote slot is
+# behind the local slot.
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+my $standby_psql = $standby->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+$standby_psql->quit;
+
+done_testing();
--
2.34.1
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.
Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart. Having said that, overall the statistical data
getting lost on restart doesn't seem right to me. But I would like to
know what others think.
Show quoted text
``` +static void +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason) +{ + Assert(MyReplicationSlot); ```I think no need to require *slot as an argument. We can use the variable to shorten
like update_local_synced_slot().Fixed
```
# Verify pg_sync_replication_slots is failing
ok( $stderr =~ /skipping slot synchronization because the received slot sync/,
'pg_sync_replication_slots failed as expected');
```This may be matter of taste, but can you check whole of log message? Latter part
indicates the actual reason.The latter part of the message contains LSN values, which are not
stable across runs. To avoid hard-coding specific LSNs, I matched the
fixed, non-variable parts of the message while still covering the
reason for the failure.```
# Detach injection point
$standby->safe_psql(
'postgres', q{
SELECT injection_points_detach('slot-sync-skip');
SELECT injection_points_wakeup('slot-sync-skip');
});
```Not mandatory, but you can quit the background session if you release the
injection point.Fixed
Apart from the above changes. I have renamed the functions to
consistently use the term 'slotsync' instead of 'slot_sync'
update_slot_sync_skip_stats -> update_slotsync_skip_stats
pgstat_report_replslot_sync_skip -> pgstat_report_replslotsync_skipI have attached the updated v8 patch with the latest changes.
Thanks,
Shlok Kyal
On Fri, Nov 21, 2025 at 8:52 AM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart.
But I think after restart in most cases, the slot will be created
fresh as we persist the slot for the first time only when sync is
successful. Now, when the standby has not flushed the WAL
corresponding to remote_lsn (SS_SKIP_WAL_NOT_FLUSHED), the slotsync
can be skipped even for persisted slots but that should be rare and we
anyway won't be able to persist the slot_skip reason in other cases as
slot itself won't be persisted by that time. So, I feel keeping the
slot_sync_skip_reason in memory is sufficient.
--
With Regards,
Amit Kapila.
On Fri, Nov 21, 2025 at 9:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 21, 2025 at 8:52 AM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart.But I think after restart in most cases, the slot will be created
fresh as we persist the slot for the first time only when sync is
successful. Now, when the standby has not flushed the WAL
corresponding to remote_lsn (SS_SKIP_WAL_NOT_FLUSHED), the slotsync
can be skipped even for persisted slots but that should be rare and we
anyway won't be able to persist the slot_skip reason in other cases as
slot itself won't be persisted by that time. So, I feel keeping the
slot_sync_skip_reason in memory is sufficient.
Okay, makes sense.
A few comments on 001:
1)
+ slots, but may (if leftover from a promotedstandby) contain a
timestamp.
promotedstandby --> promoted standby
2)
+ s.slotsync_skip_count,
+ s.last_slotsync_skip_at,
Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.
3)
+#include "utils/injection_point.h"
+ INJECTION_POINT("slot-sync-skip", NULL);
I think we can move both to patch 002 as these are needed for test alone.
4)
+
+ /*
+ * If found_consistent_snapshot is not NULL, a true value means
+ * the slot synchronization was successful, while a false value
+ * means it was skipped (see
+ * update_and_persist_local_synced_slot()). If
+ * found_consistent_snapshot is NULL, no such check exists, so the
+ * stats can be updated directly.
+ */
+ if (!found_consistent_snapshot || *found_consistent_snapshot)
+ update_slotsync_skip_stats(SS_SKIP_NONE);
I see that when 'found_consistent_snapshot' is true we update stats
here but when it is false, we update stats in the caller. Also for
'remote_slot_precedes' case, we update stats to SS_SKIP_REMOTE_BEHIND
here itself. I think for 'SS_SKIP_NO_CONSISTENT_SNAPSHOT' as well, we
should update stats here instead of caller.
We can do this:
update_local_synced_slot()
{
skip_reason = none;
if (remote is behind)
skip_reason = SS_SKIP_REMOTE_BEHIND;
if (found_consistent_snapshot && (*found_consistent_snapshot == false))
skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
--Later in this function, when syncing is done:
update_slotsync_skip_stats(skip_reason)
}
5)
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
+
+ ReplicationSlotRelease();
+ }
Shall we check 'slot->data.invalidated' along with 'synced'
condition. That way, no need to acquire or release the slot if it is
invalidated. We can fetch 'invalidated' under the same SpinLock
itself.
6)
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal coresponding
+ * to confirmed flush on remote slot */
on --> of
coresponding --> corresponding
thanks
Shveta
On Fri, Nov 21, 2025 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote:
A few comments on 001:
1)
+ slots, but may (if leftover from a promotedstandby) contain a
timestamp.
promotedstandby --> promoted standby2) + s.slotsync_skip_count, + s.last_slotsync_skip_at,Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.
Sounds reasonable especially when the doc explains that this is the
time at which last slot synchronization was skipped.
BTW, can we split the patch into two? First for slot sync skip stats,
and the second one for SlotSyncSkipReason? It would be easier to
review and commit that way.
--
With Regards,
Amit Kapila.
On Fri, 21 Nov 2025 at 11:00, shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Nov 21, 2025 at 9:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 21, 2025 at 8:52 AM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart.But I think after restart in most cases, the slot will be created
fresh as we persist the slot for the first time only when sync is
successful. Now, when the standby has not flushed the WAL
corresponding to remote_lsn (SS_SKIP_WAL_NOT_FLUSHED), the slotsync
can be skipped even for persisted slots but that should be rare and we
anyway won't be able to persist the slot_skip reason in other cases as
slot itself won't be persisted by that time. So, I feel keeping the
slot_sync_skip_reason in memory is sufficient.Okay, makes sense.
A few comments on 001:
1)
+ slots, but may (if leftover from a promotedstandby) contain a
timestamp.
promotedstandby --> promoted standby2) + s.slotsync_skip_count, + s.last_slotsync_skip_at,Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.3)
+#include "utils/injection_point.h"+ INJECTION_POINT("slot-sync-skip", NULL);
I think we can move both to patch 002 as these are needed for test alone.
I have merged the test patch in the main patch and split the patch as
per Amit's suggestion in [1]/messages/by-id/CAA4eK1JM+roQMyXekvwJprMMaK_-HL+n5twinZQ8fufnDEU28g@mail.gmail.com.
So this change will not be required.
4) + + /* + * If found_consistent_snapshot is not NULL, a true value means + * the slot synchronization was successful, while a false value + * means it was skipped (see + * update_and_persist_local_synced_slot()). If + * found_consistent_snapshot is NULL, no such check exists, so the + * stats can be updated directly. + */ + if (!found_consistent_snapshot || *found_consistent_snapshot) + update_slotsync_skip_stats(SS_SKIP_NONE);I see that when 'found_consistent_snapshot' is true we update stats
here but when it is false, we update stats in the caller. Also for
'remote_slot_precedes' case, we update stats to SS_SKIP_REMOTE_BEHIND
here itself. I think for 'SS_SKIP_NO_CONSISTENT_SNAPSHOT' as well, we
should update stats here instead of caller.We can do this:
update_local_synced_slot()
{
skip_reason = none;if (remote is behind)
skip_reason = SS_SKIP_REMOTE_BEHIND;if (found_consistent_snapshot && (*found_consistent_snapshot == false))
skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;--Later in this function, when syncing is done:
update_slotsync_skip_stats(skip_reason)
}
I agree with your suggestion that we should update the stats for case
of 'SS_SKIP_NO_CONSISTENT_SNAPSHOT' in update_local_synced_slot
instead of update_and_persist_local_synced_slot. I have made the
change for the same.
For your second suggestion of using the 'skip_reason' variable instead.
For the 'if (remote is behind)' we are returning inside this if
condition itself. So for this case we need to call function
update_slotsync_skip_stats directly. But for case of
'found_consistent_snapshot' we can optimise it.
5) + if (synced) + { + ReplicationSlotAcquire(NameStr(slot->data.name), true, false); + + if (slot->data.invalidated == RS_INVAL_NONE) + update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED); + + ReplicationSlotRelease(); + }Shall we check 'slot->data.invalidated' along with 'synced'
condition. That way, no need to acquire or release the slot if it is
invalidated. We can fetch 'invalidated' under the same SpinLock
itself.
Also I think to check 'slot->data.invalidated' we need to acquire the
slot due a similar race condition can occur as described in comments
below:
* The slot has been synchronized before.
*
* It is important to acquire the slot here before checking
* invalidation. If we don't acquire the slot first, there could be a
* race condition that the local slot could be invalidated just after
* checking the 'invalidated' flag here and we could end up
* overwriting 'invalidated' flag to remote_slot's value. See
* InvalidatePossiblyObsoleteSlot() where it invalidates slot directly
* if the slot is not acquired by other processes.
*
I thought about it and I agree with your suggestion in [2]/messages/by-id/CAJpy0uC9WsJhc-qeFmz_JTPjW1vH3Zm+zS6jX0PKTY6vtEp38w@mail.gmail.com to add a
new reason in slotsync_skip_reason to indicate the slot skip is
happening due to an invalidated slot.
I think it is cleaner and would avoid confusion for users. I made the
changes for the same in the latest patch.
6) + SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal coresponding + * to confirmed flush on remote slot */on --> of
coresponding --> corresponding
I have also addressed the remaining comments. I have attached the
updated v9 patches.
[1]: /messages/by-id/CAA4eK1JM+roQMyXekvwJprMMaK_-HL+n5twinZQ8fufnDEU28g@mail.gmail.com
[2]: /messages/by-id/CAJpy0uC9WsJhc-qeFmz_JTPjW1vH3Zm+zS6jX0PKTY6vtEp38w@mail.gmail.com
Thanks,
Shlok Kyal
Attachments:
v9-0001-Add-slotsync-skip-statistics.patchapplication/octet-stream; name=v9-0001-Add-slotsync-skip-statistics.patchDownload
From d7133f583cf5fd101777b91bd85d4f0ecad8897a Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Fri, 21 Nov 2025 12:35:46 +0530
Subject: [PATCH v9 1/2] Add slotsync skip statistics
This patch introduces two new columns slotsync_skip_count,
slotsync_last_skip to pg_stat_replication_slots view. These columns
indicates the number of time the slotsync was skipped and the last time
at which slotsync was skipped.
---
contrib/test_decoding/expected/stats.out | 12 +-
doc/src/sgml/monitoring.sgml | 25 +++
src/backend/catalog/system_views.sql | 2 +
src/backend/replication/logical/slotsync.c | 49 +++++
src/backend/utils/activity/pgstat_replslot.c | 25 +++
src/backend/utils/adt/pgstatfuncs.c | 18 +-
src/include/catalog/pg_proc.dat | 6 +-
src/include/pgstat.h | 3 +
src/test/recovery/meson.build | 1 +
.../recovery/t/050_slotsync_skip_stats.pl | 176 ++++++++++++++++++
src/test/regress/expected/rules.out | 4 +-
11 files changed, 307 insertions(+), 14 deletions(-)
create mode 100644 src/test/recovery/t/050_slotsync_skip_stats.pl
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 28da9123cc8..d0749bc0daf 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_last_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_last_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 436ef0e8bd0..b054ebb4ade 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1659,6 +1659,31 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped. The value of this
+ column has no meaning on the primary server; it defaults to 0 for all
+ slots, but may (if leftover from a promoted standby) also have a
+ positive value.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_last_skip_at</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped. The value of this
+ column has no meaning on the primary server; it defaults to NULL for all
+ slots, but may (if leftover from a promoted standby) contain a timestamp.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 95ad29a64b9..c77a5d15a2e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1076,6 +1076,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.slotsync_last_skip_at,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..052117f0481 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -218,6 +219,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ /* Update slot sync skip stats */
+ pgstat_report_replslotsync_skip(slot);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -277,6 +281,17 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
errdetail_internal("Remote slot has LSN %X/%08X but local slot has LSN %X/%08X.",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
LSN_FORMAT_ARGS(slot->data.confirmed_flush)));
+
+ /*
+ * If found_consistent_snapshot is not NULL, a true value means
+ * the slot synchronization was successful, while a false value
+ * means it was skipped (see
+ * update_and_persist_local_synced_slot()). If
+ * found_consistent_snapshot is NULL, no such check exists,
+ * indicating slot synchronization is successful.
+ */
+ if (found_consistent_snapshot && !(*found_consistent_snapshot))
+ pgstat_report_replslotsync_skip(slot);
}
updated_xmin_or_lsn = true;
@@ -580,6 +595,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -590,6 +608,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (!found_consistent_snapshot)
{
+ /*
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
+ */
ereport(LOG,
errmsg("could not synchronize replication slot \"%s\"", remote_slot->name),
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
@@ -600,6 +622,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -634,6 +660,25 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ pgstat_report_replslotsync_skip(slot);
+
+ ReplicationSlotRelease();
+ }
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -707,6 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
+ pgstat_report_replslotsync_skip(slot);
+
ReplicationSlotRelease();
return slot_updated;
}
@@ -939,6 +986,8 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+ INJECTION_POINT("slot-sync-skip", NULL);
+
return some_slot_updated;
}
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index d210c261ac6..d9cc4ec2314 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -102,6 +102,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslotsync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slotsync_skip_count += 1;
+ statent->slotsync_last_skip_at = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3d98d064a94..46e103ce7c3 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2129,7 +2129,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2160,7 +2160,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slotsync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slotsync_last_skip_at",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2186,11 +2190,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(slotent->mem_exceeded_count);
values[8] = Int64GetDatum(slotent->total_txns);
values[9] = Int64GetDatum(slotent->total_bytes);
+ values[10] = Int64GetDatum(slotent->slotsync_skip_count);
+
+ if (slotent->slotsync_last_skip_at == 0)
+ nulls[11] = true;
+ else
+ values[11] = TimestampTzGetDatum(slotent->slotsync_last_skip_at);
if (slotent->stat_reset_timestamp == 0)
- nulls[10] = true;
+ nulls[12] = true;
else
- values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index aaadfd8c748..b10809ba9b6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5691,9 +5691,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slotsync_skip_count,slotsync_last_skip_at,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a68e725259a..144042c1940 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -400,6 +400,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter mem_exceeded_count;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slotsync_skip_count;
+ TimestampTz slotsync_last_skip_at;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -745,6 +747,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslotsync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 523a5cd5b52..17551cf114a 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -58,6 +58,7 @@ tests += {
't/047_checkpoint_physical_slot.pl',
't/048_vacuum_horizon_floor.pl',
't/049_wait_for_lsn.pl',
+ 't/050_slotsync_skip_stats.pl',
],
},
}
diff --git a/src/test/recovery/t/050_slotsync_skip_stats.pl b/src/test/recovery/t/050_slotsync_skip_stats.pl
new file mode 100644
index 00000000000..39ce9ef702b
--- /dev/null
+++ b/src/test/recovery/t/050_slotsync_skip_stats.pl
@@ -0,0 +1,176 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 'logical');
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+});
+$primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+my $connstr = $primary->connstr;
+$standby->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Initial sync of replication slots
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify slot is synced successfully
+my $result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '0', "check slot sync skip count after initial sync");
+
+# Update pg_hba.conf and restart the primary to reject streaming replication
+# connections. WAL records won't be replicated to the standby until the
+# configuration is restored.
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local all all trust
+host all all 127.0.0.1/32 trust
+host all all ::1/128 trust
+});
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+ SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());
+));
+
+my ($stdout, $stderr);
+# Attempt to sync replication slots while standby is behind
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify pg_sync_replication_slots is failing
+ok( $stderr =~
+ qr/skipping slot synchronization because the received slot sync.*is ahead of the standby position/,
+ 'pg_sync_replication_slots failed as expected');
+
+# Check slot sync skip count when standby is behind
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore streaming replication connection
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+});
+$primary->restart;
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby);
+
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Test for case when slot sync is skipped when the remote slot is
+# behind the local slot.
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+my $standby_psql = $standby->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+$standby_psql->quit;
+
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 372a2188c22..adda7f425e2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2151,9 +2151,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.slotsync_last_skip_at,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slotsync_skip_count, slotsync_last_skip_at, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
--
2.34.1
v9-0002-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v9-0002-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From 7739ec27aa8ac66471a1018f0753bf970ba973fd Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Fri, 21 Nov 2025 14:39:47 +0530
Subject: [PATCH v9 2/2] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reson to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/system-views.sgml | 48 +++++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 44 +++++++++++++++--
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 28 ++++++++++-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 26 ++++++++++
.../recovery/t/050_slotsync_skip_stats.pl | 33 ++++++++++++-
src/test/regress/expected/rules.out | 5 +-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 182 insertions(+), 13 deletions(-)
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7971498fe75..79fce8ed697 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,54 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. This field is set only
+ for logical slots that are being synchronized from a primary server (that
+ is, those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). The value of this column has no meaning on the
+ primary server; it defaults to <literal>none</literal> for all slots, but
+ may (if leftover from a promoted standby) also have a value other than
+ <literal>none</literal>. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the last slot
+ synchronization was skipped because the standby could not build a
+ consistent snapshot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the last slot
+ synchronization was skipped because the slot is invalidated.
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77a5d15a2e..1445ac5a78c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 052117f0481..83cc37239c7 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -149,6 +149,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync_skip(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -171,6 +200,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -220,7 +250,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin));
/* Update slot sync skip stats */
- pgstat_report_replslotsync_skip(slot);
+ update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -291,12 +321,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* indicating slot synchronization is successful.
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync_skip(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -673,7 +706,10 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
- pgstat_report_replslotsync_skip(slot);
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
+ else
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
}
@@ -752,7 +788,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync_skip(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..7200c7f071d 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,30 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_WAL_NOT_FLUSHED:
+ return "wal_not_flushed";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ case SS_SKIP_INVALID:
+ return "slot_invalidated";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +259,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +467,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slotsync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b10809ba9b6..4205d565df3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11511,9 +11511,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..054e81b6c43 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,22 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +265,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Also, temporary
+ * slots are dropped after server restart, so there is no value in
+ * persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/050_slotsync_skip_stats.pl b/src/test/recovery/t/050_slotsync_skip_stats.pl
index 39ce9ef702b..59b512bc116 100644
--- a/src/test/recovery/t/050_slotsync_skip_stats.pl
+++ b/src/test/recovery/t/050_slotsync_skip_stats.pl
@@ -66,7 +66,13 @@ $primary->wait_for_replay_catchup($standby);
$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
# Verify slot is synced successfully
-my $result = $standby->safe_psql('postgres',
+my $result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+$result = $standby->safe_psql('postgres',
"SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
);
is($result, '0', "check slot sync skip count after initial sync");
@@ -102,7 +108,14 @@ ok( $stderr =~
qr/skipping slot synchronization because the received slot sync.*is ahead of the standby position/,
'pg_sync_replication_slots failed as expected');
-# Check slot sync skip count when standby is behind
+# Check skip reason and count when standby is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'wal_not_flushed', "slot sync skip when standby is behind");
+
$result = $standby->safe_psql('postgres',
"SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
);
@@ -129,6 +142,15 @@ $primary->restart;
# Wait for standby to catch up
$primary->wait_for_replay_catchup($standby);
+# Check that skip reason is reset after successful sync
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'none', "slotsync_skip_reason is reset after successful sync");
+
# Cleanup: drop the logical slot and ensure standby catches up
$primary->safe_psql('postgres',
"SELECT pg_drop_replication_slot('slot_sync')");
@@ -159,6 +181,13 @@ select pg_sync_replication_slots();
$standby->wait_for_event('client backend', 'slot-sync-skip');
# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
$result = $standby->safe_psql('postgres',
"SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index adda7f425e2..feac3e4c089 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57f2a9ccdc5..435927e5638 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2802,6 +2802,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
On Fri, 21 Nov 2025 at 09:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 21, 2025 at 8:52 AM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart.But I think after restart in most cases, the slot will be created
fresh as we persist the slot for the first time only when sync is
successful. Now, when the standby has not flushed the WAL
corresponding to remote_lsn (SS_SKIP_WAL_NOT_FLUSHED), the slotsync
can be skipped even for persisted slots but that should be rare and we
anyway won't be able to persist the slot_skip reason in other cases as
slot itself won't be persisted by that time. So, I feel keeping the
slot_sync_skip_reason in memory is sufficient.
I agree.
I have added a comment why we decided to keep slotsync_skip_reason in_memory
Attached the latest patch in [1]/messages/by-id/CANhcyEUiY4ENuoi7kZSsLJFLn6yA_-oPCKrek=BaMfFfY3=P1w@mail.gmail.com.
[1]: /messages/by-id/CANhcyEUiY4ENuoi7kZSsLJFLn6yA_-oPCKrek=BaMfFfY3=P1w@mail.gmail.com
Thanks
Shlok Kyal
On Fri, 21 Nov 2025 at 11:30, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 21, 2025 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote:
A few comments on 001:
1)
+ slots, but may (if leftover from a promotedstandby) contain a
timestamp.
promotedstandby --> promoted standby2) + s.slotsync_skip_count, + s.last_slotsync_skip_at,Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.Sounds reasonable especially when the doc explains that this is the
time at which last slot synchronization was skipped.
Made the change in the latest patch.
BTW, can we split the patch into two? First for slot sync skip stats,
and the second one for SlotSyncSkipReason? It would be easier to
review and commit that way.
I have split the patch into two:
0001 - Adds columns slotsync_skip_count and slotsync_last_skip_at in
pg_stats_replication_slots view
0002 - Adds column slotsync_skip_reason in pg_replication_slots view
Please find the latest patch in [1]/messages/by-id/CANhcyEUiY4ENuoi7kZSsLJFLn6yA_-oPCKrek=BaMfFfY3=P1w@mail.gmail.com.
[1]: /messages/by-id/CANhcyEUiY4ENuoi7kZSsLJFLn6yA_-oPCKrek=BaMfFfY3=P1w@mail.gmail.com
Thanks,
Shlok Kyal
On Fri, 21 Nov 2025 at 15:41, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 21 Nov 2025 at 11:00, shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Nov 21, 2025 at 9:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 21, 2025 at 8:52 AM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Shlok,
Thanks for updating the patch. Few more comments.
I’m not sure if this has already been discussed; I couldn’t find any
mention of it in the thread. Why don’t we persist
'slot_sync_skip_reason' (it is outside of
ReplicationSlotPersistentData)? If a slot wasn’t synced during the
last cycle and the server restarts, it would be helpful to know the
reason it wasn’t synced prior to the node restart.Actually I did not think in this direction. I think it will be useful
to persist 'slot_sync_skip_reason'. I have made the change for the
same in the latest patch.Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
are essential to decode or replicate changes correctly, but sync status is not
used for the purpose. Personally considered, slot sync would re-start soon after
the reboot so that it is OK to start with empty. How about others?If we want to serialize the info, we should do further tasks:
- update SLOT_VERSION
- make the slot dirty then SaveSlotToPath() when the status is updated.I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart.But I think after restart in most cases, the slot will be created
fresh as we persist the slot for the first time only when sync is
successful. Now, when the standby has not flushed the WAL
corresponding to remote_lsn (SS_SKIP_WAL_NOT_FLUSHED), the slotsync
can be skipped even for persisted slots but that should be rare and we
anyway won't be able to persist the slot_skip reason in other cases as
slot itself won't be persisted by that time. So, I feel keeping the
slot_sync_skip_reason in memory is sufficient.Okay, makes sense.
A few comments on 001:
1)
+ slots, but may (if leftover from a promotedstandby) contain a
timestamp.
promotedstandby --> promoted standby2) + s.slotsync_skip_count, + s.last_slotsync_skip_at,Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.3)
+#include "utils/injection_point.h"+ INJECTION_POINT("slot-sync-skip", NULL);
I think we can move both to patch 002 as these are needed for test alone.
I have merged the test patch in the main patch and split the patch as
per Amit's suggestion in [1].
So this change will not be required.4) + + /* + * If found_consistent_snapshot is not NULL, a true value means + * the slot synchronization was successful, while a false value + * means it was skipped (see + * update_and_persist_local_synced_slot()). If + * found_consistent_snapshot is NULL, no such check exists, so the + * stats can be updated directly. + */ + if (!found_consistent_snapshot || *found_consistent_snapshot) + update_slotsync_skip_stats(SS_SKIP_NONE);I see that when 'found_consistent_snapshot' is true we update stats
here but when it is false, we update stats in the caller. Also for
'remote_slot_precedes' case, we update stats to SS_SKIP_REMOTE_BEHIND
here itself. I think for 'SS_SKIP_NO_CONSISTENT_SNAPSHOT' as well, we
should update stats here instead of caller.We can do this:
update_local_synced_slot()
{
skip_reason = none;if (remote is behind)
skip_reason = SS_SKIP_REMOTE_BEHIND;if (found_consistent_snapshot && (*found_consistent_snapshot == false))
skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;--Later in this function, when syncing is done:
update_slotsync_skip_stats(skip_reason)
}I agree with your suggestion that we should update the stats for case
of 'SS_SKIP_NO_CONSISTENT_SNAPSHOT' in update_local_synced_slot
instead of update_and_persist_local_synced_slot. I have made the
change for the same.For your second suggestion of using the 'skip_reason' variable instead.
For the 'if (remote is behind)' we are returning inside this if
condition itself. So for this case we need to call function
update_slotsync_skip_stats directly. But for case of
'found_consistent_snapshot' we can optimise it.5) + if (synced) + { + ReplicationSlotAcquire(NameStr(slot->data.name), true, false); + + if (slot->data.invalidated == RS_INVAL_NONE) + update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED); + + ReplicationSlotRelease(); + }Shall we check 'slot->data.invalidated' along with 'synced'
condition. That way, no need to acquire or release the slot if it is
invalidated. We can fetch 'invalidated' under the same SpinLock
itself.Also I think to check 'slot->data.invalidated' we need to acquire the
slot due a similar race condition can occur as described in comments
below:
* The slot has been synchronized before.
*
* It is important to acquire the slot here before checking
* invalidation. If we don't acquire the slot first, there could be a
* race condition that the local slot could be invalidated just after
* checking the 'invalidated' flag here and we could end up
* overwriting 'invalidated' flag to remote_slot's value. See
* InvalidatePossiblyObsoleteSlot() where it invalidates slot directly
* if the slot is not acquired by other processes.
*I thought about it and I agree with your suggestion in [2] to add a
new reason in slotsync_skip_reason to indicate the slot skip is
happening due to an invalidated slot.
I think it is cleaner and would avoid confusion for users. I made the
changes for the same in the latest patch.6) + SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal coresponding + * to confirmed flush on remote slot */on --> of
coresponding --> correspondingI have also addressed the remaining comments. I have attached the
updated v9 patches.
[1]: /messages/by-id/CAA4eK1JM+roQMyXekvwJprMMaK_-HL+n5twinZQ8fufnDEU28g@mail.gmail.com
[2]: /messages/by-id/CAJpy0uC9WsJhc-qeFmz_JTPjW1vH3Zm+zS6jX0PKTY6vtEp38w@mail.gmail.com
The Cbot complained that it was not able to build the docs. I have
fixed it and attached the latest patch.
Thanks
Shlok Kyal
Attachments:
v10-0001-Add-slotsync-skip-statistics.patchapplication/octet-stream; name=v10-0001-Add-slotsync-skip-statistics.patchDownload
From d7133f583cf5fd101777b91bd85d4f0ecad8897a Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Fri, 21 Nov 2025 12:35:46 +0530
Subject: [PATCH v10 1/2] Add slotsync skip statistics
This patch introduces two new columns slotsync_skip_count,
slotsync_last_skip to pg_stat_replication_slots view. These columns
indicates the number of time the slotsync was skipped and the last time
at which slotsync was skipped.
---
contrib/test_decoding/expected/stats.out | 12 +-
doc/src/sgml/monitoring.sgml | 25 +++
src/backend/catalog/system_views.sql | 2 +
src/backend/replication/logical/slotsync.c | 49 +++++
src/backend/utils/activity/pgstat_replslot.c | 25 +++
src/backend/utils/adt/pgstatfuncs.c | 18 +-
src/include/catalog/pg_proc.dat | 6 +-
src/include/pgstat.h | 3 +
src/test/recovery/meson.build | 1 +
.../recovery/t/050_slotsync_skip_stats.pl | 176 ++++++++++++++++++
src/test/regress/expected/rules.out | 4 +-
11 files changed, 307 insertions(+), 14 deletions(-)
create mode 100644 src/test/recovery/t/050_slotsync_skip_stats.pl
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 28da9123cc8..d0749bc0daf 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_last_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_last_skip_at | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+-----------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 436ef0e8bd0..b054ebb4ade 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1659,6 +1659,31 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_count</structfield><type>bigint</type>
+ </para>
+ <para>
+ Number of times the slot synchronization is skipped. The value of this
+ column has no meaning on the primary server; it defaults to 0 for all
+ slots, but may (if leftover from a promoted standby) also have a
+ positive value.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_last_skip_at</structfield><type>timestamp with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped. The value of this
+ column has no meaning on the primary server; it defaults to NULL for all
+ slots, but may (if leftover from a promoted standby) contain a timestamp.
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 95ad29a64b9..c77a5d15a2e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1076,6 +1076,8 @@ CREATE VIEW pg_stat_replication_slots AS
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.slotsync_last_skip_at,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..052117f0481 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/timeout.h"
@@ -218,6 +219,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
LSN_FORMAT_ARGS(slot->data.restart_lsn),
slot->data.catalog_xmin));
+ /* Update slot sync skip stats */
+ pgstat_report_replslotsync_skip(slot);
+
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -277,6 +281,17 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
errdetail_internal("Remote slot has LSN %X/%08X but local slot has LSN %X/%08X.",
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
LSN_FORMAT_ARGS(slot->data.confirmed_flush)));
+
+ /*
+ * If found_consistent_snapshot is not NULL, a true value means
+ * the slot synchronization was successful, while a false value
+ * means it was skipped (see
+ * update_and_persist_local_synced_slot()). If
+ * found_consistent_snapshot is NULL, no such check exists,
+ * indicating slot synchronization is successful.
+ */
+ if (found_consistent_snapshot && !(*found_consistent_snapshot))
+ pgstat_report_replslotsync_skip(slot);
}
updated_xmin_or_lsn = true;
@@ -580,6 +595,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
* current location when recreating the slot in the next cycle. It may
* take more time to create such a slot. Therefore, we keep this slot
* and attempt the synchronization in the next cycle.
+ *
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
*/
return false;
}
@@ -590,6 +608,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (!found_consistent_snapshot)
{
+ /*
+ * We do not need to update the slot sync skip stats here as it will
+ * be already updated in function update_local_synced_slot.
+ */
ereport(LOG,
errmsg("could not synchronize replication slot \"%s\"", remote_slot->name),
errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
@@ -600,6 +622,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
ReplicationSlotPersist();
+ /*
+ * For the success case we do not update the slot sync skip stats here as
+ * it is already be updated in update_local_synced_slot.
+ */
ereport(LOG,
errmsg("newly created replication slot \"%s\" is sync-ready now",
remote_slot->name));
@@ -634,6 +660,25 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ pgstat_report_replslotsync_skip(slot);
+
+ ReplicationSlotRelease();
+ }
+ }
+
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
@@ -707,6 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
+ pgstat_report_replslotsync_skip(slot);
+
ReplicationSlotRelease();
return slot_updated;
}
@@ -939,6 +986,8 @@ synchronize_slots(WalReceiverConn *wrconn)
if (started_tx)
CommitTransactionCommand();
+ INJECTION_POINT("slot-sync-skip", NULL);
+
return some_slot_updated;
}
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index d210c261ac6..d9cc4ec2314 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -102,6 +102,31 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
pgstat_unlock_entry(entry_ref);
}
+/*
+ * Report replication slot sync skip statistics.
+ *
+ * We can rely on the stats for the slot to exist and to belong to this
+ * slot. We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslotsync_skip(ReplicationSlot *slot)
+{
+ PgStat_EntryRef *entry_ref;
+ PgStatShared_ReplSlot *shstatent;
+ PgStat_StatReplSlotEntry *statent;
+
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ ReplicationSlotIndex(slot), false);
+ shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
+ statent = &shstatent->stats;
+
+ statent->slotsync_skip_count += 1;
+ statent->slotsync_last_skip_at = GetCurrentTimestamp();
+
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Report replication slot creation.
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3d98d064a94..46e103ce7c3 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2129,7 +2129,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 13
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2160,7 +2160,11 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slotsync_skip_count",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slotsync_last_skip_at",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -2186,11 +2190,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(slotent->mem_exceeded_count);
values[8] = Int64GetDatum(slotent->total_txns);
values[9] = Int64GetDatum(slotent->total_bytes);
+ values[10] = Int64GetDatum(slotent->slotsync_skip_count);
+
+ if (slotent->slotsync_last_skip_at == 0)
+ nulls[11] = true;
+ else
+ values[11] = TimestampTzGetDatum(slotent->slotsync_last_skip_at);
if (slotent->stat_reset_timestamp == 0)
- nulls[10] = true;
+ nulls[12] = true;
else
- values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+ values[12] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index aaadfd8c748..b10809ba9b6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5691,9 +5691,9 @@
{ oid => '6169', descr => 'statistics: information about replication slot',
proname => 'pg_stat_get_replication_slot', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
- proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,stats_reset}',
+ proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slotsync_skip_count,slotsync_last_skip_at,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a68e725259a..144042c1940 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -400,6 +400,8 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter mem_exceeded_count;
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
+ PgStat_Counter slotsync_skip_count;
+ TimestampTz slotsync_last_skip_at;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
@@ -745,6 +747,7 @@ extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
extern void pgstat_reset_replslot(const char *name);
struct ReplicationSlot;
extern void pgstat_report_replslot(struct ReplicationSlot *slot, const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslotsync_skip(struct ReplicationSlot *slot);
extern void pgstat_create_replslot(struct ReplicationSlot *slot);
extern void pgstat_acquire_replslot(struct ReplicationSlot *slot);
extern void pgstat_drop_replslot(struct ReplicationSlot *slot);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 523a5cd5b52..17551cf114a 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -58,6 +58,7 @@ tests += {
't/047_checkpoint_physical_slot.pl',
't/048_vacuum_horizon_floor.pl',
't/049_wait_for_lsn.pl',
+ 't/050_slotsync_skip_stats.pl',
],
},
}
diff --git a/src/test/recovery/t/050_slotsync_skip_stats.pl b/src/test/recovery/t/050_slotsync_skip_stats.pl
new file mode 100644
index 00000000000..39ce9ef702b
--- /dev/null
+++ b/src/test/recovery/t/050_slotsync_skip_stats.pl
@@ -0,0 +1,176 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Skip all tests if injection points are not supported in this build
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 'logical');
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+});
+$primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# Take a backup of the primary for standby initialization
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Initialize standby from primary backup
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+my $connstr = $primary->connstr;
+$standby->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_slot_name = 'sb1_slot'
+primary_conninfo = '$connstr dbname=postgres'
+));
+
+# Create a physical replication slot on primary for standby
+$primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
+$standby->start;
+
+# Create a logical replication slot on primary for testing
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Initial sync of replication slots
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify slot is synced successfully
+my $result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '0', "check slot sync skip count after initial sync");
+
+# Update pg_hba.conf and restart the primary to reject streaming replication
+# connections. WAL records won't be replicated to the standby until the
+# configuration is restored.
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local all all trust
+host all all 127.0.0.1/32 trust
+host all all ::1/128 trust
+});
+$primary->restart;
+
+# Advance the failover slot so that confirmed flush LSN of remote slot become
+# ahead of standby's flushed LSN
+$primary->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE t1(a int);
+ INSERT INTO t1 VALUES(1);
+ SELECT pg_replication_slot_advance('slot_sync', pg_current_wal_lsn());
+));
+
+my ($stdout, $stderr);
+# Attempt to sync replication slots while standby is behind
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Verify pg_sync_replication_slots is failing
+ok( $stderr =~
+ qr/skipping slot synchronization because the received slot sync.*is ahead of the standby position/,
+ 'pg_sync_replication_slots failed as expected');
+
+# Check slot sync skip count when standby is behind
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Repeat sync to ensure skip count increments
+($result, $stdout, $stderr) =
+ $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
+
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '2', "check slot sync skip count");
+
+# Restore streaming replication connection
+$primary->append_conf(
+ 'pg_hba.conf', qq{
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+});
+$primary->restart;
+
+# Wait for standby to catch up
+$primary->wait_for_replay_catchup($standby);
+
+# Cleanup: drop the logical slot and ensure standby catches up
+$primary->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('slot_sync')");
+$primary->wait_for_replay_catchup($standby);
+
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Test for case when slot sync is skipped when the remote slot is
+# behind the local slot.
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
+
+# Attach injection point to simulate wait
+my $standby_psql = $standby->background_psql('postgres');
+$standby_psql->query_safe(
+ q(select injection_points_attach('slot-sync-skip','wait')));
+
+# Initiate sync of failover slots
+$standby_psql->query_until(
+ qr/slot_sync/,
+ q(
+\echo slot_sync
+select pg_sync_replication_slots();
+));
+
+# Wait for backend to reach injection point
+$standby->wait_for_event('client backend', 'slot-sync-skip');
+
+# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql('postgres',
+ "SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
+);
+is($result, '1', "check slot sync skip count");
+
+# Detach injection point
+$standby->safe_psql(
+ 'postgres', q{
+ SELECT injection_points_detach('slot-sync-skip');
+ SELECT injection_points_wakeup('slot-sync-skip');
+});
+
+$standby_psql->quit;
+
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 372a2188c22..adda7f425e2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2151,9 +2151,11 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.mem_exceeded_count,
s.total_txns,
s.total_bytes,
+ s.slotsync_skip_count,
+ s.slotsync_last_skip_at,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slotsync_skip_count, slotsync_last_skip_at, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
--
2.34.1
v10-0002-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v10-0002-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From a7c6526eb30b1f5a47f210f950b3876d7b536ac9 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Fri, 21 Nov 2025 14:39:47 +0530
Subject: [PATCH v10 2/2] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reson to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/system-views.sgml | 49 +++++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 44 +++++++++++++++--
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 28 ++++++++++-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 26 ++++++++++
.../recovery/t/050_slotsync_skip_stats.pl | 33 ++++++++++++-
src/test/regress/expected/rules.out | 5 +-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 183 insertions(+), 13 deletions(-)
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7971498fe75..ec797d2d916 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,55 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. This field is set only
+ for logical slots that are being synchronized from a primary server (that
+ is, those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). The value of this column has no meaning on the
+ primary server; it defaults to <literal>none</literal> for all slots, but
+ may (if leftover from a promoted standby) also have a value other than
+ <literal>none</literal>. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the last slot
+ synchronization was skipped because the standby could not build a
+ consistent snapshot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the last slot
+ synchronization was skipped because the slot is invalidated.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77a5d15a2e..1445ac5a78c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 052117f0481..83cc37239c7 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -149,6 +149,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync_skip(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -171,6 +200,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -220,7 +250,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin));
/* Update slot sync skip stats */
- pgstat_report_replslotsync_skip(slot);
+ update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
if (remote_slot_precedes)
*remote_slot_precedes = true;
@@ -291,12 +321,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* indicating slot synchronization is successful.
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync_skip(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -673,7 +706,10 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
- pgstat_report_replslotsync_skip(slot);
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
+ else
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
}
@@ -752,7 +788,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync_skip(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..7200c7f071d 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,30 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_WAL_NOT_FLUSHED:
+ return "wal_not_flushed";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ case SS_SKIP_INVALID:
+ return "slot_invalidated";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +259,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +467,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slotsync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b10809ba9b6..4205d565df3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11511,9 +11511,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..054e81b6c43 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,22 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +265,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Also, temporary
+ * slots are dropped after server restart, so there is no value in
+ * persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/050_slotsync_skip_stats.pl b/src/test/recovery/t/050_slotsync_skip_stats.pl
index 39ce9ef702b..59b512bc116 100644
--- a/src/test/recovery/t/050_slotsync_skip_stats.pl
+++ b/src/test/recovery/t/050_slotsync_skip_stats.pl
@@ -66,7 +66,13 @@ $primary->wait_for_replay_catchup($standby);
$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
# Verify slot is synced successfully
-my $result = $standby->safe_psql('postgres',
+my $result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced"
+);
+is($result, 'none', "slot sync reason is none");
+$result = $standby->safe_psql('postgres',
"SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
);
is($result, '0', "check slot sync skip count after initial sync");
@@ -102,7 +108,14 @@ ok( $stderr =~
qr/skipping slot synchronization because the received slot sync.*is ahead of the standby position/,
'pg_sync_replication_slots failed as expected');
-# Check slot sync skip count when standby is behind
+# Check skip reason and count when standby is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'wal_not_flushed', "slot sync skip when standby is behind");
+
$result = $standby->safe_psql('postgres',
"SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
);
@@ -129,6 +142,15 @@ $primary->restart;
# Wait for standby to catch up
$primary->wait_for_replay_catchup($standby);
+# Check that skip reason is reset after successful sync
+$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
+);
+is($result, 'none', "slotsync_skip_reason is reset after successful sync");
+
# Cleanup: drop the logical slot and ensure standby catches up
$primary->safe_psql('postgres',
"SELECT pg_drop_replication_slot('slot_sync')");
@@ -159,6 +181,13 @@ select pg_sync_replication_slots();
$standby->wait_for_event('client backend', 'slot-sync-skip');
# Logical slot is temporary and sync will skip because remote is behind
+$result = $standby->safe_psql(
+ 'postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots
+ WHERE slot_name = 'slot_sync' AND synced AND temporary"
+);
+is($result, 'remote_behind', "slot sync skip as remote is behind");
+
$result = $standby->safe_psql('postgres',
"SELECT slotsync_skip_count FROM pg_stat_replication_slots WHERE slot_name = 'slot_sync'"
);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index adda7f425e2..feac3e4c089 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57f2a9ccdc5..435927e5638 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2802,6 +2802,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
On Fri, Nov 21, 2025 at 6:21 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
The Cbot complained that it was not able to build the docs. I have
fixed it and attached the latest patch.
Few comments on 0001:
1.
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_last_skip_at</structfield><type>timestamp
with time zone</type>
+ </para>
+ <para>
+ Time at which last slot synchronization was skipped.
How important it is to use last in the above field? Isn't
slotsync_skip_at sufficient, as the description says all?
2.
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ pgstat_report_replslotsync_skip(slot);
+
+ ReplicationSlotRelease();
+ }
+ }
I think acquiring the slot just for reporting the stats sounds too
much. How about instead moving the check if
(remote_slot->confirmed_lsn > latestFlushPtr) in the later if/else
blocks where we have the slot acquire acquired?
In if block, we can move it just before below check
if (slot->data.persistency == RS_TEMPORARY)
{
slot_updated = update_and_persist_local_synced_slot(remote_slot,
In the else block, just before following
update_and_persist_local_synced_slot(remote_slot, remote_dbid);
3.
We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslotsync_skip(ReplicationSlot *slot)
3A. Instead of having such a comment, it is better to have elog(ERROR,
.. or Assertion if the stats entry doesn't exist by this time.
3B. Also, let's name the function as pgstat_report_replslotsync().
4. I think the current test used in the patch is very complex as it
requires multiple server restarts. Instead, we can use a test similar
to what is being used in the patch proposed in the thread [1]/messages/by-id/CAFPTHDYgcvZ60eHWUav3VQSeVibivx7A31rp_pFAkMQrW=j=5A@mail.gmail.com.
[1]: /messages/by-id/CAFPTHDYgcvZ60eHWUav3VQSeVibivx7A31rp_pFAkMQrW=j=5A@mail.gmail.com
--
With Regards,
Amit Kapila.
On Fri, Nov 21, 2025 at 6:21 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
The Cbot complained that it was not able to build the docs. I have
fixed it and attached the latest patch.
Few comments on 001:
1)
In pgstat_report_replslotsync_skip(), shall we have a sanity check to
ensure that slot is logical and function is called on a
standby(RecoveryInProgress)?
2)
In update_and_persist_local_synced_slot(), we have comments at 3
places to indicate that stats are updated in some other
function.Instead, shall we have a generic comment in the header of
this function?
3)
Shall we have the test moved to the existing file
040_standby_failover_slots_sync?
4)
We should be able to make test work without injection point, please
try for that. Also it should be enough to test stats for one flow
instead of multiple flows.
thanks
Shveta
On Mon, Nov 24, 2025 at 11:24 AM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Nov 21, 2025 at 6:21 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
The Cbot complained that it was not able to build the docs. I have
fixed it and attached the latest patch.Few comments on 001:
Fixed these and other open comments on 0001 and pushed. But I see a BF
failure [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=scorpion&dt=2025-11-25%2007%3A38%3A19&stg=recovery-check. I think it is due to the reason that the patch forgot to
release the slot in one of the code paths. I'll investigate it a bit
more and push the fix.
--
With Regards,
Amit Kapila.
On Tuesday, November 25, 2025 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Nov 24, 2025 at 11:24 AM shveta malik <shveta.malik@gmail.com>
wrote:On Fri, Nov 21, 2025 at 6:21 PM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:
The Cbot complained that it was not able to build the docs. I have
fixed it and attached the latest patch.Few comments on 001:
Fixed these and other open comments on 0001 and pushed. But I see a BF
failure [1]. I think it is due to the reason that the patch forgot to release the
slot in one of the code paths.
Right, I agree. Here is the patch to release the slot at necessary places.
Best Regards,
Hou zj
Attachments:
v1-0001-Fix-a-BF-failure-where-a-replication-slot-was-not.patchapplication/octet-stream; name=v1-0001-Fix-a-BF-failure-where-a-replication-slot-was-not.patchDownload
From 2fc8dc66a5c77a3b86f25daefdbbc8015dd7ac2f Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@fujitsu.com>
Date: Tue, 25 Nov 2025 16:08:35 +0800
Subject: [PATCH v1] Fix a BF failure where a replication slot was not released
in time
The commit 76b7872 did not release the replication slot in the slotsync worker
when a cycle of slot synchronization was skipped due to the required WAL not
being received and flushed on the standby server. This commit fixes that
issue.
---
src/backend/replication/logical/slotsync.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 7e9dc7f18bd..1f4f06d467b 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -725,6 +725,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
+ ReplicationSlotRelease();
+
return slot_updated;
}
@@ -824,6 +826,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
+ ReplicationSlotRelease();
+
return false;
}
--
2.51.1.windows.1
Dear Hou, Amit,
Right, I agree. Here is the patch to release the slot at necessary places.
Thanks for working on it. However, BF machines have not satisfied the fix yet.
There are still two failures after 3df4df53b06 [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2025-11-25%2009%3A03%3A17 [2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2025-11-25%2009%3A01%3A08.
The reported issue was that standby server failed to synchronize the slot after
the slot is re-created on the primary. According to [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2025-11-25%2009%3A03%3A17, slots on standby has
newer catalog xmin than primary. Like:
```
LOG: could not synchronize replication slot "lsub1_slot"
DETAIL: Synchronization could lead to data loss, because the remote slot needs WAL at LSN 0/030163A8 and catalog xmin 758, but the standby has LSN 0/030163A8 and catalog xmin 759.
```
Per analysis, the newly created logical slot on primary has the initial catalog_xmin
as 758 due to the physical slot holding catalog_xmin:758. The standby does not
have slots, so the new slot will have the latest xid (759) as catalog_xmin.
Anyway, I think this is a test issue.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2025-11-25%2009%3A03%3A17
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2025-11-25%2009%3A01%3A08
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Tuesday, November 25, 2025 6:30 PM Kuroda, Hayato <kuroda.hayato@fujitsu.com> wrote:
Dear Hou, Amit,
Right, I agree. Here is the patch to release the slot at necessary places.
Thanks for working on it. However, BF machines have not satisfied the fix yet.
There are still two failures after 3df4df53b06 [1] [2].The reported issue was that standby server failed to synchronize the slot after
the slot is re-created on the primary. According to [1], slots on standby has
newer catalog xmin than primary. Like:```
LOG: could not synchronize replication slot "lsub1_slot"
DETAIL: Synchronization could lead to data loss, because the remote slot
needs WAL at LSN 0/030163A8 and catalog xmin 758, but the standby has
LSN 0/030163A8 and catalog xmin 759.
```Per analysis, the newly created logical slot on primary has the initial
catalog_xmin as 758 due to the physical slot holding catalog_xmin:758. The
standby does not have slots, so the new slot will have the latest xid (759) as
catalog_xmin.Anyway, I think this is a test issue.
The issue is that the physical slot on the primary retains a catalog_xmin of
758, causing newly created slots to inherit the same catalog_xmin. In contrast,
the standby, lacking slots, assigns an initial catalog_xmin of 759 to newly
synced slots. The problem arises because the logical slot on the primary isn't
being consumed, preventing the catalog_xmin from advancing, which leads to the
test timing out.
Previously, we avoided this issue by intentionally preventing xid assignment
during slotsync tests, ensuring xmin/catalog_xmin remained static in most cases.
However, the new test involves some DDLs in between tests causing this issue.
Rather than adding additional wait events for control, we discussed to relocate
the test to the end—after promoting the standby—where syncing the slot
successfully isn't necessary. Since the test's goal is solely to verify slotsync
skip statistics, this approach should suffice.
Here is the patch to modify the test.
[1]: https://buildfarm.postgresql.org/cgi-
bin/show_log.pl?nm=scorpion&dt=2025-11-25%2009%3A03%3A17
[2]: https://buildfarm.postgresql.org/cgi-
bin/show_log.pl?nm=grassquit&dt=2025-11-25%2009%3A01%3A08
Best Regards,
Hou zj
Attachments:
v1-0001-Fix-test-failure-caused-by-commit-76b78721ca.patchapplication/octet-stream; name=v1-0001-Fix-test-failure-caused-by-commit-76b78721ca.patchDownload
From cd5a916631b4d34884cde3aa6f29f12427146bc9 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@fujitsu.com>
Date: Tue, 25 Nov 2025 18:42:55 +0800
Subject: [PATCH v1] Fix test failure caused by commit 76b78721ca
The test failed because it assumed that a newly created logical
replication slot could be synced to the standby by the slotsync worker.
However, the presence of an existing physical slot caused the new logical
slot to use a non-latest xmin. On the standby, the DDL had already been
replayed, advancing xmin, which led to the slotsync worker failing to sync
the lagging logical slot.
To resolve this, we moved the slot sync statistics tests to run after the
tests that do not require the newly created slot to be sync-ready.
---
.../t/040_standby_failover_slots_sync.pl | 128 ++++++++++--------
1 file changed, 70 insertions(+), 58 deletions(-)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index b2bf5072bbf..8e682b88ead 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -213,75 +213,19 @@ is( $standby1->safe_psql(
##################################################
# Test that the synchronized slot will be dropped if the corresponding remote
# slot on the primary server has been dropped.
-#
-# Note: Both slots need to be dropped for the next test to work
##################################################
$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
-$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
is( $standby1->safe_psql(
'postgres',
- q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
+ q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
),
"t",
'synchronized slot has been dropped');
-##################################################
-# Verify that slotsync skip statistics are correctly updated when the
-# slotsync operation is skipped.
-##################################################
-
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
- 'postgres', qq(
- SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
- CREATE TABLE wal_push(a int);
-));
-$primary->wait_for_replay_catchup($standby1);
-
-my $log_offset = -s $standby1->logfile;
-
-# Enable slot sync worker.
-$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = on));
-$standby1->reload;
-
-# Confirm that the slot sync worker is able to start.
-$standby1->wait_for_log(qr/slot sync worker started/, $log_offset);
-
-# Confirm that the slot sync is skipped due to the remote slot lagging behind
-$standby1->wait_for_log(
- qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
-
-# Confirm that the slotsync skip statistics is updated
-$result = $standby1->safe_psql('postgres',
- "SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
-);
-is($result, 't', "check slot sync skip count increments");
-
-# Clean the table
-$primary->safe_psql(
- 'postgres', qq(
- DROP TABLE wal_push;
-));
-$primary->wait_for_replay_catchup($standby1);
-
-# Re-create the logical replication slot and sync it to standby for further tests
-$primary->safe_psql(
- 'postgres', qq(
- SELECT pg_drop_replication_slot('lsub1_slot');
- SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-));
-$standby1->wait_for_log(
- qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
- $log_offset);
-
-$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = off));
-$standby1->reload;
-
##################################################
# Test that if the synchronized slot is invalidated while the remote slot is
# still valid, the slot will be dropped and re-created on the standby by
@@ -337,7 +281,7 @@ $inactive_since_on_primary =
# the failover slots.
$primary->wait_for_replay_catchup($standby1);
-$log_offset = -s $standby1->logfile;
+my $log_offset = -s $standby1->logfile;
# Synchronize the primary server slots to the standby.
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -1043,4 +987,72 @@ $result = $standby1->safe_psql('postgres',
is($result, '1', "data can be consumed using snap_test_slot");
+##################################################
+# Verify that slotsync skip statistics are correctly updated when the
+# slotsync operation is skipped.
+##################################################
+
+# Enable slot sync on standby2
+$standby2->append_conf('postgresql.conf', qq(
+hot_standby_feedback = on
+primary_conninfo = '$connstr_1 dbname=postgres'
+log_min_messages = 'debug2'
+));
+
+$standby2->reload;
+
+# Commit the pending prepared transaction
+$primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
+$primary->wait_for_replay_catchup($standby2);
+
+# Remove all logical replication slots on the primary server to ensure the
+# corresponding synced slots are also removed. This guarantees that the safest
+# catalog_xmin on the standby is not preserved by existing slots, allowing newly
+# created slots to have a fresher initial catalog_xmin.
+$primary->psql('postgres', qq(
+ SELECT pg_drop_replication_slot('lsub1_slot');
+ SELECT pg_drop_replication_slot('snap_test_slot');
+));
+
+$subscriber2->safe_psql(
+ 'postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+
+$standby2->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+is( $standby2->safe_psql(
+ 'postgres',
+ q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot', 'snap_test_slot');}
+ ),
+ "t",
+ 'synchronized slot has been dropped');
+
+# Create a logical replication slot and create some DDL on the primary so
+# that the slot lags behind the standby.
+$primary->safe_psql(
+ 'postgres', qq(
+ SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
+ CREATE TABLE wal_push(a int);
+));
+$primary->wait_for_replay_catchup($standby2);
+
+$log_offset = -s $standby2->logfile;
+
+# Enable slot sync worker
+$standby2->append_conf('postgresql.conf', qq(sync_replication_slots = on));
+
+$standby2->reload;
+
+# Confirm that the slot sync worker is able to start.
+$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+
+# Confirm that the slot sync is skipped due to the remote slot lagging behind
+$standby2->wait_for_log(
+ qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+
+# Confirm that the slotsync skip statistics is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 't', "check slot sync skip count increments");
+
done_testing();
--
2.51.1.windows.1
On Tue, Nov 25, 2025 at 5:18 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
Here is the patch to modify the test.
+##################################################
+# Verify that slotsync skip statistics are correctly updated when the
+# slotsync operation is skipped.
+##################################################
....
+$standby2->reload;
+
+# Commit the pending prepared transaction
+$primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
+$primary->wait_for_replay_catchup($standby2);
+
+# Remove all logical replication slots on the primary server to ensure the
+# corresponding synced slots are also removed. This guarantees that the safest
+# catalog_xmin on the standby is not preserved by existing slots,
allowing newly
+# created slots to have a fresher initial catalog_xmin.
+$primary->psql('postgres', qq(
+ SELECT pg_drop_replication_slot('lsub1_slot');
+ SELECT pg_drop_replication_slot('snap_test_slot');
+));
+
+$subscriber2->safe_psql(
+ 'postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+
+$standby2->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+is( $standby2->safe_psql(
+ 'postgres',
+ q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'lsub2_slot', 'snap_test_slot');}
+ ),
+ "t",
+ 'synchronized slot has been dropped');
This is too much dependency of previous tests on the new one. We
should do cleanup of previous tests separately, if we want to use some
existing set up. Also, do we need to ensure that standby2's slots are
dropped? Did we ever sync slots on standby2? If so, cleaning up here
looks odd. This is to make new tests rely less on the outcome of
previous tests.
--
With Regards,
Amit Kapila.
On Wednesday, November 26, 2025 11:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 25, 2025 at 5:18 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:Here is the patch to modify the test.
+################################################## +# Verify that slotsync skip statistics are correctly updated when the # +slotsync operation is skipped. +################################################## .... +$standby2->reload; + +# Commit the pending prepared transaction +$primary->safe_psql('postgres', "COMMIT PREPARED +'test_twophase_slotsync';"); +$primary->wait_for_replay_catchup($standby2); + +# Remove all logical replication slots on the primary server to ensure +the # corresponding synced slots are also removed. This guarantees that +the safest # catalog_xmin on the standby is not preserved by existing +slots, allowing newly +# created slots to have a fresher initial catalog_xmin. +$primary->psql('postgres', qq( + SELECT pg_drop_replication_slot('lsub1_slot'); + SELECT pg_drop_replication_slot('snap_test_slot'); +)); + +$subscriber2->safe_psql( + 'postgres', 'DROP SUBSCRIPTION regress_mysub2;'); + +$standby2->safe_psql('postgres', "SELECT +pg_sync_replication_slots();"); + +is( $standby2->safe_psql( + 'postgres', + q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot', 'snap_test_slot');} + ), + "t", + 'synchronized slot has been dropped');This is too much dependency of previous tests on the new one. We should do
cleanup of previous tests separately, if we want to use some existing set up.
Also, do we need to ensure that standby2's slots are dropped? Did we ever
sync slots on standby2? If so, cleaning up here looks odd. This is to make new
tests rely less on the outcome of previous tests.
I think we did not sync slots to standby2, so I removed the checks for that.
I also adjusted the test in a way that it cleans up existing slots before starting
new tests.
Here is the updated version.
Best Regards,
Hou zj
Attachments:
v2-0001-Fix-test-failure-caused-by-commit-76b78721ca.patchapplication/octet-stream; name=v2-0001-Fix-test-failure-caused-by-commit-76b78721ca.patchDownload
From de3d8dc82cb4eefc1d516091f6fc5bd5dc9252cd Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@fujitsu.com>
Date: Tue, 25 Nov 2025 18:42:55 +0800
Subject: [PATCH v2] Fix test failure caused by commit 76b78721ca
The test failed because it assumed that a newly created logical
replication slot could be synced to the standby by the slotsync worker.
However, the presence of an existing physical slot caused the new logical
slot to use a non-latest xmin. On the standby, the DDL had already been
replayed, advancing xmin, which led to the slotsync worker failing to sync
the lagging logical slot.
To resolve this, we moved the slot sync statistics tests to run after the
tests that do not require the newly created slot to be sync-ready.
---
.../t/040_standby_failover_slots_sync.pl | 128 ++++++++++--------
1 file changed, 70 insertions(+), 58 deletions(-)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index b2bf5072bbf..7d3c82e0a29 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -213,75 +213,19 @@ is( $standby1->safe_psql(
##################################################
# Test that the synchronized slot will be dropped if the corresponding remote
# slot on the primary server has been dropped.
-#
-# Note: Both slots need to be dropped for the next test to work
##################################################
$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
-$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
is( $standby1->safe_psql(
'postgres',
- q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
+ q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
),
"t",
'synchronized slot has been dropped');
-##################################################
-# Verify that slotsync skip statistics are correctly updated when the
-# slotsync operation is skipped.
-##################################################
-
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
- 'postgres', qq(
- SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
- CREATE TABLE wal_push(a int);
-));
-$primary->wait_for_replay_catchup($standby1);
-
-my $log_offset = -s $standby1->logfile;
-
-# Enable slot sync worker.
-$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = on));
-$standby1->reload;
-
-# Confirm that the slot sync worker is able to start.
-$standby1->wait_for_log(qr/slot sync worker started/, $log_offset);
-
-# Confirm that the slot sync is skipped due to the remote slot lagging behind
-$standby1->wait_for_log(
- qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
-
-# Confirm that the slotsync skip statistics is updated
-$result = $standby1->safe_psql('postgres',
- "SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
-);
-is($result, 't', "check slot sync skip count increments");
-
-# Clean the table
-$primary->safe_psql(
- 'postgres', qq(
- DROP TABLE wal_push;
-));
-$primary->wait_for_replay_catchup($standby1);
-
-# Re-create the logical replication slot and sync it to standby for further tests
-$primary->safe_psql(
- 'postgres', qq(
- SELECT pg_drop_replication_slot('lsub1_slot');
- SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-));
-$standby1->wait_for_log(
- qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
- $log_offset);
-
-$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = off));
-$standby1->reload;
-
##################################################
# Test that if the synchronized slot is invalidated while the remote slot is
# still valid, the slot will be dropped and re-created on the standby by
@@ -337,7 +281,7 @@ $inactive_since_on_primary =
# the failover slots.
$primary->wait_for_replay_catchup($standby1);
-$log_offset = -s $standby1->logfile;
+my $log_offset = -s $standby1->logfile;
# Synchronize the primary server slots to the standby.
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -1043,4 +987,72 @@ $result = $standby1->safe_psql('postgres',
is($result, '1', "data can be consumed using snap_test_slot");
+##################################################
+# Remove any unnecessary replication slots and clear pending transactions on the
+# primary server to ensure a clean environment.
+##################################################
+
+$primary->psql(
+ 'postgres', qq(
+ SELECT pg_drop_replication_slot('sb1_slot');
+ SELECT pg_drop_replication_slot('lsub1_slot');
+ SELECT pg_drop_replication_slot('snap_test_slot');
+));
+
+$subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+
+# Verify that all slots have been removed except the one necessary for standby2,
+# which is needed for further testing.
+is( $primary->safe_psql(
+ 'postgres',
+ q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name != 'sb2_slot';}
+ ),
+ "t",
+ 'all replication slots have been dropped except the physical slot used by standby2'
+);
+
+# Commit the pending prepared transaction
+$primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
+$primary->wait_for_replay_catchup($standby2);
+
+##################################################
+# Verify that slotsync skip statistics are correctly updated when the
+# slotsync operation is skipped.
+##################################################
+
+# Create a logical replication slot and create some DDL on the primary so
+# that the slot lags behind the standby.
+$primary->safe_psql(
+ 'postgres', qq(
+ SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
+ CREATE TABLE wal_push(a int);
+));
+$primary->wait_for_replay_catchup($standby2);
+
+$log_offset = -s $standby2->logfile;
+
+# Enable slot sync worker
+$standby2->append_conf(
+ 'postgresql.conf', qq(
+hot_standby_feedback = on
+primary_conninfo = '$connstr_1 dbname=postgres'
+log_min_messages = 'debug2'
+sync_replication_slots = on
+));
+
+$standby2->reload;
+
+# Confirm that the slot sync worker is able to start.
+$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+
+# Confirm that the slot sync is skipped due to the remote slot lagging behind
+$standby2->wait_for_log(
+ qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+
+# Confirm that the slotsync skip statistics is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 't', "check slot sync skip count increments");
+
done_testing();
--
2.51.1.windows.1
Dear Hou,
I think we did not sync slots to standby2, so I removed the checks for that.
I also adjusted the test in a way that it cleans up existing slots before starting
new tests.
Thanks for updating the patch. I confirmed on my env that your patch could be
applied cleanly and tests were passed. Pgperltidy say nothing for your patch.
Also, I preferred the current style.
I think it is worth checking on BF to see how they say.
On Wed, 26 Nov 2025 at 09:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Hou,
I think we did not sync slots to standby2, so I removed the checks for that.
I also adjusted the test in a way that it cleans up existing slots before starting
new tests.Thanks for updating the patch. I confirmed on my env that your patch could be
applied cleanly and tests were passed. Pgperltidy say nothing for your patch.
Also, I preferred the current style.I think it is worth checking on BF to see how they say.
Thanks Amit for pushing the 0001 patch.
Thanks Hou-san and Kuroda-san on fixing the test.
I have rebased the 0002 patch on the current HEAD.
Thanks,
Shlok Kyal
Attachments:
v12-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v12-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From 727d1df40a9cc38ebb5cb525e4dfa714799ae6e0 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 26 Nov 2025 08:52:29 +0530
Subject: [PATCH v12] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reason to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/monitoring.sgml | 4 +-
doc/src/sgml/system-views.sgml | 49 +++++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 43 ++++++++++++++--
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 28 ++++++++++-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 26 ++++++++++
.../t/040_standby_failover_slots_sync.pl | 6 +++
src/test/regress/expected/rules.out | 5 +-
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 158 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcc8474a7f7..e0556b6baac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1665,7 +1665,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Number of times the slot synchronization is skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
@@ -1677,7 +1677,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Time at which last slot synchronization was skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 0e623e7fb86..4dd5174ba1e 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,55 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. This field is set only
+ for logical slots that are being synchronized from a primary server (that
+ is, those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). The value of this column has no meaning on the
+ primary server; it defaults to <literal>none</literal> for all slots, but
+ may (if leftover from a promoted standby) also have a value other than
+ <literal>none</literal>. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the last slot
+ synchronization was skipped because the standby could not build a
+ consistent snapshot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the last slot
+ synchronization was skipped because the slot is invalidated.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6fffdb9398e..086c4c8fb6f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1f4f06d467b..818f432172c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -170,6 +199,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -188,7 +218,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin))
{
/* Update slot sync skip stats */
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
/*
* This can happen in following situations:
@@ -286,12 +316,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* persisted. See update_and_persist_local_synced_slot().
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -696,7 +729,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
@@ -711,7 +744,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
@@ -812,7 +845,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..7200c7f071d 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,30 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_NONE:
+ return "none";
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_WAL_NOT_FLUSHED:
+ return "wal_not_flushed";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ case SS_SKIP_INVALID:
+ return "slot_invalidated";
+ }
+
+ Assert(false);
+ return "none";
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +259,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +467,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReason(slot_contents.slotsync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66431940700..66af2d96d67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11519,9 +11519,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..054e81b6c43 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,22 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +265,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Also, temporary
+ * slots are dropped after server restart, so there is no value in
+ * persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 7d3c82e0a29..da1c3aee65d 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1049,6 +1049,12 @@ $standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
$standby2->wait_for_log(
qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+# Confirm that the slotsync skip reason is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 'remote_behind', "check slot sync skip reason");
+
# Confirm that the slotsync skip statistics is updated
$result = $standby2->safe_psql('postgres',
"SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c337f0bc30d..94e45dd4d57 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index dfcd619bfee..c7e52cc2191 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2807,6 +2807,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
On Wed, Nov 26, 2025 at 9:30 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Wed, 26 Nov 2025 at 09:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Hou,
I think we did not sync slots to standby2, so I removed the checks for that.
I also adjusted the test in a way that it cleans up existing slots before starting
new tests.Thanks for updating the patch. I confirmed on my env that your patch could be
applied cleanly and tests were passed. Pgperltidy say nothing for your patch.
Also, I preferred the current style.I think it is worth checking on BF to see how they say.
Thanks Amit for pushing the 0001 patch.
Thanks Hou-san and Kuroda-san on fixing the test.I have rebased the 0002 patch on the current HEAD.
Thanks. Please find a few comments:
1)
+ The reason for the last slot synchronization skip. This field
is set only
+ for logical slots that are being synchronized from a primary
server (that
+ is, those whose <structfield>synced</structfield> field is
+ <literal>true</literal>). The value of this column has no meaning on the
+ primary server; it defaults to <literal>none</literal> for all
slots, but
+ may (if leftover from a promoted standby) also have a value other than
+ <literal>none</literal>. Possible values are:
We can make this similar to existing fields (slotsync_skip_count and
slotsync_skip_at):
Slot synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
2)
Doc on possible values of slotsync_skip_reason can be improved. Example
+ <literal>remote_behind</literal> means that the last slot
+ synchronization was skipped because the slot is ahead of the
+ corresponding failover slot on the primary.
We can get rid of 'last slot synchronization was skipped' from all the
reasons. (See 'invalidation_reason' possible values for reference, it
does not mention 'was invalidated' in any).
3)
+ <literal>wal_not_flushed</literal> means that the last slot
+ synchronization was skipped because the standby had not flushed the
+ WAL corresponding to the confirmed flush position on the remote slot.
I am not sure if we need to mention 'confirmed flush position'. Shall we say:
'.....because the standby had not flushed the WAL corresponding to the
position reserved on the remote slot'.
Thoughts?
4)
+ <literal>none</literal> means that the last slot synchronization
+ completed successfully.
Do we even need to mention 'none' in doc? 'invalidation_reason' does
not mention it.
5)
postgres=# select slot_name, invalidation_reason, slotsync_skip_reason
from pg_replication_slots;
slot_name | invalidation_reason | slotsync_skip_reason
----------------+---------------------+----------------------
failover_slot2 | | none
Shall we keep 'slotsync_skip_reason' as NULL instead of 'none' similar
to invalidation_reason. Thoughts?
6)
If we plan to keep slotsync_skip_reason as NULL instead of 'none' for
non-skipped cases (above comment), then below code can be optimised as
we do not need to update 'none' as stats.
'skip_reason' and last update() call can then be removed and we can
simply call update_slotsync_skip_stats() instead of 'skip_reason =
SS_SKIP_NO_CONSISTENT_SNAPSHOT'.
update_local_synced_slot():
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
+ update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
7)
+static char *
+GetSlotSyncSkipReason(SlotSyncSkipReason reason)
We are passing 'SlotSyncSkipReason' and function name says
'GetSlotSyncSkipReason', looks confusing.
Shall we rename function name to GetSlotSyncSkipString or
GetSlotSyncSkipReasonName (similar to GetSlotInvalidationCauseName)
8)
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Also, temporary
+ * slots are dropped after server restart, so there is no value in
+ * persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
I feel, 'Also-->Since' will make more sense here.
9)
In doc for slotsync_skip_count and slotsync_skip_at
+ Slot
+ synchronization occur only on standby servers and thus this column has
+ no meaning on the primary server.
occur --> occurs
thanks
Shveta
On Wed, 26 Nov 2025 at 10:25, shveta malik <shveta.malik@gmail.com> wrote:
On Wed, Nov 26, 2025 at 9:30 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Wed, 26 Nov 2025 at 09:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Hou,
I think we did not sync slots to standby2, so I removed the checks for that.
I also adjusted the test in a way that it cleans up existing slots before starting
new tests.Thanks for updating the patch. I confirmed on my env that your patch could be
applied cleanly and tests were passed. Pgperltidy say nothing for your patch.
Also, I preferred the current style.I think it is worth checking on BF to see how they say.
Thanks Amit for pushing the 0001 patch.
Thanks Hou-san and Kuroda-san on fixing the test.I have rebased the 0002 patch on the current HEAD.
Thanks. Please find a few comments:
1)
+ The reason for the last slot synchronization skip. This field is set only + for logical slots that are being synchronized from a primary server (that + is, those whose <structfield>synced</structfield> field is + <literal>true</literal>). The value of this column has no meaning on the + primary server; it defaults to <literal>none</literal> for all slots, but + may (if leftover from a promoted standby) also have a value other than + <literal>none</literal>. Possible values are:We can make this similar to existing fields (slotsync_skip_count and
slotsync_skip_at):Slot synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.2)
Doc on possible values of slotsync_skip_reason can be improved. Example+ <literal>remote_behind</literal> means that the last slot + synchronization was skipped because the slot is ahead of the + corresponding failover slot on the primary.We can get rid of 'last slot synchronization was skipped' from all the
reasons. (See 'invalidation_reason' possible values for reference, it
does not mention 'was invalidated' in any).3) + <literal>wal_not_flushed</literal> means that the last slot + synchronization was skipped because the standby had not flushed the + WAL corresponding to the confirmed flush position on the remote slot.I am not sure if we need to mention 'confirmed flush position'. Shall we say:
'.....because the standby had not flushed the WAL corresponding to the
position reserved on the remote slot'.
Thoughts?
I think the suggested wording would be more clear to understand for
the user. Added the change.
4) + <literal>none</literal> means that the last slot synchronization + completed successfully.Do we even need to mention 'none' in doc? 'invalidation_reason' does
not mention it.5)
postgres=# select slot_name, invalidation_reason, slotsync_skip_reason
from pg_replication_slots;
slot_name | invalidation_reason | slotsync_skip_reason
----------------+---------------------+----------------------
failover_slot2 | | noneShall we keep 'slotsync_skip_reason' as NULL instead of 'none' similar
to invalidation_reason. Thoughts?
I agree with your suggestion. I have removed the 'none' value and used
NULL instead.
6)
If we plan to keep slotsync_skip_reason as NULL instead of 'none' for
non-skipped cases (above comment), then below code can be optimised as
we do not need to update 'none' as stats.
'skip_reason' and last update() call can then be removed and we can
simply call update_slotsync_skip_stats() instead of 'skip_reason =
SS_SKIP_NO_CONSISTENT_SNAPSHOT'.update_local_synced_slot():
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE; + update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND); + skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT; + /* Update slot sync skip stats */ + update_slotsync_skip_stats(skip_reason);
I think we need this change even if we use NULL instead of 'none'.
This change ensures that the slot sync reason is set to NULL if slot
sync is successful.
7)
+static char * +GetSlotSyncSkipReason(SlotSyncSkipReason reason)We are passing 'SlotSyncSkipReason' and function name says
'GetSlotSyncSkipReason', looks confusing.
Shall we rename function name to GetSlotSyncSkipString or
GetSlotSyncSkipReasonName (similar to GetSlotInvalidationCauseName)8)
+ * A slotsync skip typically occurs only for temporary slots. For + * persistent slots it is extremely rare (e.g., cases like + * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Also, temporary + * slots are dropped after server restart, so there is no value in + * persisting the slotsync_skip_reason. + */ + SlotSyncSkipReason slotsync_skip_reason;I feel, 'Also-->Since' will make more sense here.
9)
In doc for slotsync_skip_count and slotsync_skip_at+ Slot + synchronization occur only on standby servers and thus this column has + no meaning on the primary server.occur --> occurs
I have also addressed the remaining comments and attached the updated patch.
Thanks,
Shlok Kyal
Attachments:
v13-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v13-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From 2e54b6f9b83827822d05c8edd429e14477209b3f Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 26 Nov 2025 08:52:29 +0530
Subject: [PATCH v13] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reason to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/monitoring.sgml | 4 +-
doc/src/sgml/system-views.sgml | 39 +++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 43 ++++++++++++++++---
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 31 ++++++++++++-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 26 +++++++++++
.../t/040_standby_failover_slots_sync.pl | 6 +++
src/test/regress/expected/rules.out | 5 ++-
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 151 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcc8474a7f7..e0556b6baac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1665,7 +1665,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Number of times the slot synchronization is skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
@@ -1677,7 +1677,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Time at which last slot synchronization was skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 0e623e7fb86..5a08a222ad3 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,45 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. Slot synchronization
+ occurs only on standby servers and for synced slots (that is, those whose
+ <structfield>synced</structfield> field is <literal>true</literal>).
+ Thus, this column has no meaning on the primary server and slot which are
+ not synced. <literal>NULL</literal> if slot synchronization is not
+ performed yet or is successful. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>remote_behind</literal> means that the slot is ahead of the
+ corresponding failover slot on the primary.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the standby had not
+ flushed the WAL corresponding to the position reserved on the remote
+ slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the standby could
+ not build a consistent snapshot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the slot is invalidated.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6fffdb9398e..086c4c8fb6f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1f4f06d467b..818f432172c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -170,6 +199,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -188,7 +218,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin))
{
/* Update slot sync skip stats */
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
/*
* This can happen in following situations:
@@ -286,12 +316,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* persisted. See update_and_persist_local_synced_slot().
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -696,7 +729,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
@@ -711,7 +744,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
@@ -812,7 +845,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..58262e1a9d8 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -228,6 +228,30 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReasonName(SlotSyncSkipReason reason)
+{
+ switch (reason)
+ {
+ case SS_SKIP_REMOTE_BEHIND:
+ return "remote_behind";
+ case SS_SKIP_WAL_NOT_FLUSHED:
+ return "wal_not_flushed";
+ case SS_SKIP_NO_CONSISTENT_SNAPSHOT:
+ return "no_consistent_snapshot";
+ case SS_SKIP_INVALID:
+ return "slot_invalidated";
+ default:
+ break;
+ }
+
+ Assert(false);
+ return NULL;
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
@@ -235,7 +259,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +467,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ if (slot_contents.slotsync_skip_reason == SS_SKIP_NONE)
+ nulls[i++] = true;
+ else
+ values[i++] = CStringGetTextDatum(GetSlotSyncSkipReasonName(slot_contents.slotsync_skip_reason));
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66431940700..66af2d96d67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11519,9 +11519,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..667b36b8a25 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,22 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +265,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Since, temporary
+ * slots are dropped after server restart, so there is no value in
+ * persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 7d3c82e0a29..da1c3aee65d 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1049,6 +1049,12 @@ $standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
$standby2->wait_for_log(
qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+# Confirm that the slotsync skip reason is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 'remote_behind', "check slot sync skip reason");
+
# Confirm that the slotsync skip statistics is updated
$result = $standby2->safe_psql('postgres',
"SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c337f0bc30d..94e45dd4d57 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index dfcd619bfee..c7e52cc2191 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2807,6 +2807,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
On Wed, Nov 26, 2025 at 11:58 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I have also addressed the remaining comments and attached the updated patch.
Thanks. Please find a few comment:
1)
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. Slot synchronization
+ occurs only on standby servers and for synced slots (that is,
those whose
+ <structfield>synced</structfield> field is <literal>true</literal>).
+ Thus, this column has no meaning on the primary server and
slot which are
+ not synced. <literal>NULL</literal> if slot synchronization is not
+ performed yet or is successful. Possible values are:
Second line is confusing (..for synced slots..). Shall we rephrase the
complete thing to:
The reason for the last slot synchronization skip. Slot
synchronization occurs only on standby servers and thus this column
has no meaning on the primary server. It is relevant mainly for
logical slots on standby servers whose synced field is true. NULL if
slot synchronization is successful.
2)
Also in the 'possible values section', at one place we are using a
'failover slot on primary' and at another place 'remote slot'. We can
make these words also consistent. Shall we use 'failover slot on
primary' everywhere (as failover is a more widely used term than
remote in existing doc)?
3)
GetSlotSyncSkipReasonName()
+ default:
+ break;
+ }
+
+ Assert(false);
+ return NULL;
Shall we have :
default:
elog(ERROR, "unexpected slotsync skip reason: %d", (int) reason);
return NULL;
}
}
See other such functions: handle_streamed_transaction(),
RelationCreateStorage().
thanks
Shveta
On Wednesday, November 26, 2025 2:29 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I have also addressed the remaining comments and attached the updated
patch.
Thanks for updating the patch, I have few comments:
1.
+/*
+ * Map a SlotSyncSkipReason enum to a human-readable string
+ */
+static char *
+GetSlotSyncSkipReasonName(SlotSyncSkipReason reason)
Shall we add a static array to map the Enum value to the reason name
instead of adding the following function ?
2.
+ <literal>remote_behind</literal> means that the slot is ahead of the
+ corresponding failover slot on the primary.
I think the current naming and doc is not easy for user to understand. So, I
suggest mentioning the explicit reason of this skip, e.g., the required WALs and
rows are removed or at the risk of removal. We can rename this reason to
"wal_or_rows_removed" and make the document similar to the content in
logicaldecoding.sgml.
Best Regards,
Hou zj
On Wed, 26 Nov 2025 at 14:23, Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
On Wednesday, November 26, 2025 2:29 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I have also addressed the remaining comments and attached the updated
patch.Thanks for updating the patch, I have few comments:
1. +/* + * Map a SlotSyncSkipReason enum to a human-readable string + */ +static char * +GetSlotSyncSkipReasonName(SlotSyncSkipReason reason)Shall we add a static array to map the Enum value to the reason name
instead of adding the following function ?
I think static array would be more clean and consistent with
ConflictTypeNames and SlotInvalidationCauses.
Made the changes for same.
2. + <literal>remote_behind</literal> means that the slot is ahead of the + corresponding failover slot on the primary.I think the current naming and doc is not easy for user to understand. So, I
suggest mentioning the explicit reason of this skip, e.g., the required WALs and
rows are removed or at the risk of removal. We can rename this reason to
"wal_or_rows_removed" and make the document similar to the content in
logicaldecoding.sgml.
I agree. Included the changes for same.
I have also addressed the comments by Shveta in [1]/messages/by-id/CAJpy0uDKC0QubC0pL=bZ4Qnq3eQbykLnFu5x=wmDkOmL44QL7g@mail.gmail.com.
[1]: /messages/by-id/CAJpy0uDKC0QubC0pL=bZ4Qnq3eQbykLnFu5x=wmDkOmL44QL7g@mail.gmail.com
Thanks,
Shlok Kyal
Attachments:
v14-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v14-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From b0837f71b447db4977fbcba99f066170e025cfcc Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 26 Nov 2025 08:52:29 +0530
Subject: [PATCH v14] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reason to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/monitoring.sgml | 4 +-
doc/src/sgml/system-views.sgml | 39 +++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 43 ++++++++++++++++---
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 16 ++++++-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 26 +++++++++++
.../t/040_standby_failover_slots_sync.pl | 6 +++
src/test/regress/expected/rules.out | 5 ++-
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 136 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcc8474a7f7..e0556b6baac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1665,7 +1665,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Number of times the slot synchronization is skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
@@ -1677,7 +1677,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Time at which last slot synchronization was skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 0e623e7fb86..e37ca7b2da7 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,45 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. Slot
+ synchronization occurs only on standby servers and thus this column
+ has no meaning on the primary server. It is relevant mainly for
+ logical slots on standby servers whose <structfield>synced</structfield>
+ field is <literal>true</literal>. <literal>NULL</literal> if
+ slot synchronization is successful. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>wal_or_rows_removed</literal> means that the required WALs or
+ catalog rows have already been removed from the standby.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the standby had not
+ flushed the WAL corresponding to the position reserved on the failover
+ slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the standby could
+ not build a consistent snapshot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the slot is invalidated.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6fffdb9398e..086c4c8fb6f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1f4f06d467b..53c7d629239 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -170,6 +199,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -188,7 +218,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin))
{
/* Update slot sync skip stats */
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_OR_ROWS_REMOVED);
/*
* This can happen in following situations:
@@ -286,12 +316,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* persisted. See update_and_persist_local_synced_slot().
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -696,7 +729,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
@@ -711,7 +744,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
@@ -812,7 +845,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..8c1b92ffb69 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -24,6 +24,15 @@
#include "utils/guc.h"
#include "utils/pg_lsn.h"
+/* Map SlotSyncSkipReason enum values to human-readable names. */
+static const char *SlotSyncSkipReasonNames[] = {
+ [SS_SKIP_NONE] = "none",
+ [SS_SKIP_WAL_NOT_FLUSHED] = "wal_not_flushed",
+ [SS_SKIP_WAL_OR_ROWS_REMOVED] = "wal_or_rows_removed",
+ [SS_SKIP_NO_CONSISTENT_SNAPSHOT] = "no_consistent_snapshot",
+ [SS_SKIP_INVALID] = "slot_invalidated"
+};
+
/*
* Helper function for creating a new physical replication slot with
* given arguments. Note that this function doesn't release the created
@@ -235,7 +244,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +452,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ if (slot_contents.slotsync_skip_reason == SS_SKIP_NONE)
+ nulls[i++] = true;
+ else
+ values[i++] = CStringGetTextDatum(SlotSyncSkipReasonNames[slot_contents.slotsync_skip_reason]);
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66431940700..66af2d96d67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11519,9 +11519,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..5ade8e6f73e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,22 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_WAL_OR_ROWS_REMOVED, /* Remote slot is behind the local slot */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +265,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_WAL_OR_ROWS_REMOVED). Since,
+ * temporary slots are dropped after server restart, so there is no value
+ * in persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 7d3c82e0a29..25777fa188c 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1049,6 +1049,12 @@ $standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
$standby2->wait_for_log(
qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+# Confirm that the slotsync skip reason is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 'wal_or_rows_removed', "check slot sync skip reason");
+
# Confirm that the slotsync skip statistics is updated
$result = $standby2->safe_psql('postgres',
"SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c337f0bc30d..94e45dd4d57 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index dfcd619bfee..c7e52cc2191 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2807,6 +2807,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
On Wed, 26 Nov 2025 at 15:21, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
On Wed, 26 Nov 2025 at 14:23, Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:On Wednesday, November 26, 2025 2:29 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I have also addressed the remaining comments and attached the updated
patch.Thanks for updating the patch, I have few comments:
1. +/* + * Map a SlotSyncSkipReason enum to a human-readable string + */ +static char * +GetSlotSyncSkipReasonName(SlotSyncSkipReason reason)Shall we add a static array to map the Enum value to the reason name
instead of adding the following function ?I think static array would be more clean and consistent with
ConflictTypeNames and SlotInvalidationCauses.
Made the changes for same.2. + <literal>remote_behind</literal> means that the slot is ahead of the + corresponding failover slot on the primary.I think the current naming and doc is not easy for user to understand. So, I
suggest mentioning the explicit reason of this skip, e.g., the required WALs and
rows are removed or at the risk of removal. We can rename this reason to
"wal_or_rows_removed" and make the document similar to the content in
logicaldecoding.sgml.I agree. Included the changes for same.
I have also addressed the comments by Shveta in [1].
[1]: /messages/by-id/CAJpy0uDKC0QubC0pL=bZ4Qnq3eQbykLnFu5x=wmDkOmL44QL7g@mail.gmail.com
I have made some minor changes in documentation and comments. Attached
the updated patch.
Thanks,
Shlok Kyal
Attachments:
v15-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v15-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From 20e0694b056f8981987f31af0922e79e295b01a5 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 26 Nov 2025 08:52:29 +0530
Subject: [PATCH v15] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reason to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/monitoring.sgml | 4 +-
doc/src/sgml/system-views.sgml | 41 ++++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 43 ++++++++++++++++---
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 18 +++++++-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 27 ++++++++++++
.../t/040_standby_failover_slots_sync.pl | 6 +++
src/test/regress/expected/rules.out | 5 ++-
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 141 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcc8474a7f7..e0556b6baac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1665,7 +1665,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Number of times the slot synchronization is skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
@@ -1677,7 +1677,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Time at which last slot synchronization was skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 0e623e7fb86..a7db2291df5 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,47 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. Slot
+ synchronization occurs only on standby servers and thus this column
+ has no meaning on the primary server. It is relevant mainly for
+ logical slots on standby servers whose <structfield>synced</structfield>
+ field is <literal>true</literal>. <literal>NULL</literal> if
+ slot synchronization is successful. Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>wal_or_rows_removed</literal> means that the required WALs or
+ catalog rows have already been removed or are at the risk of removal
+ from the standby.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the standby had not
+ flushed the WAL corresponding to the position reserved on the failover
+ slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the standby could
+ not build a consistent snapshot to decode WALs from
+ <structfield>restart_lsn</structfield>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the slot is invalidated.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6fffdb9398e..086c4c8fb6f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1f4f06d467b..53c7d629239 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -170,6 +199,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -188,7 +218,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin))
{
/* Update slot sync skip stats */
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_OR_ROWS_REMOVED);
/*
* This can happen in following situations:
@@ -286,12 +316,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* persisted. See update_and_persist_local_synced_slot().
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -696,7 +729,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
@@ -711,7 +744,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
@@ -812,7 +845,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..7647f051581 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -24,6 +24,17 @@
#include "utils/guc.h"
#include "utils/pg_lsn.h"
+/*
+ * Map SlotSyncSkipReason enum values to human-readable names.
+ */
+static const char *SlotSyncSkipReasonNames[] = {
+ [SS_SKIP_NONE] = "none",
+ [SS_SKIP_WAL_NOT_FLUSHED] = "wal_not_flushed",
+ [SS_SKIP_WAL_OR_ROWS_REMOVED] = "wal_or_rows_removed",
+ [SS_SKIP_NO_CONSISTENT_SNAPSHOT] = "no_consistent_snapshot",
+ [SS_SKIP_INVALID] = "slot_invalidated"
+};
+
/*
* Helper function for creating a new physical replication slot with
* given arguments. Note that this function doesn't release the created
@@ -235,7 +246,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +454,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ if (slot_contents.slotsync_skip_reason == SS_SKIP_NONE)
+ nulls[i++] = true;
+ else
+ values[i++] = CStringGetTextDatum(SlotSyncSkipReasonNames[slot_contents.slotsync_skip_reason]);
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66431940700..66af2d96d67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11519,9 +11519,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..01d949bb3c1 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,23 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_WAL_OR_ROWS_REMOVED, /* Remote slot is behind; required WAL or
+ * rows may be removed or at risk */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +266,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_WAL_OR_ROWS_REMOVED). Since,
+ * temporary slots are dropped after server restart, so there is no value
+ * in persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 7d3c82e0a29..25777fa188c 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1049,6 +1049,12 @@ $standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
$standby2->wait_for_log(
qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+# Confirm that the slotsync skip reason is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 'wal_or_rows_removed', "check slot sync skip reason");
+
# Confirm that the slotsync skip statistics is updated
$result = $standby2->safe_psql('postgres',
"SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c337f0bc30d..94e45dd4d57 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3c3523b5b2..cf3f6a7dafd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2807,6 +2807,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
On Thu, Nov 27, 2025 at 9:25 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I have made some minor changes in documentation and comments. Attached
the updated patch.
1)
<literal>NULL</literal> if
slot synchronization is successful.
We shall add 'It is' to maintain continuity as the previous sentence has it.
2)
Related to previous patch: pgstat_report_replslotsync() currently has
sanity check Assert(SlotIsLogical(slot));
Instead, shall we have Assert(slot->synced)? It will implicitly ensure
that the slot is logical, plus it is important to check that we are
updating stats of synced slot only.
thanks
Shveta
On Thu, 27 Nov 2025 at 10:42, shveta malik <shveta.malik@gmail.com> wrote:
On Thu, Nov 27, 2025 at 9:25 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I have made some minor changes in documentation and comments. Attached
the updated patch.1)
<literal>NULL</literal> if
slot synchronization is successful.We shall add 'It is' to maintain continuity as the previous sentence has it.
2)
Related to previous patch: pgstat_report_replslotsync() currently has
sanity check Assert(SlotIsLogical(slot));
Instead, shall we have Assert(slot->synced)? It will implicitly ensure
that the slot is logical, plus it is important to check that we are
updating stats of synced slot only.
Hi Shveta,
I have addressed the comments and attached the updated patch.
Thanks,
Shlok Kyal
Attachments:
v16-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchapplication/octet-stream; name=v16-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patchDownload
From d85cc22825c52acd6c5ae281350357584a948aef Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 26 Nov 2025 08:52:29 +0530
Subject: [PATCH v16] Add slotsync_skip_reason to pg_replication_slots
This patch introduces a new column slotsync_skip_reason to
pg_replication_slots view. This indicates the reason for last slot
synchronization skip.
---
doc/src/sgml/monitoring.sgml | 4 +-
doc/src/sgml/system-views.sgml | 42 ++++++++++++++++++
src/backend/catalog/system_views.sql | 3 +-
src/backend/replication/logical/slotsync.c | 43 ++++++++++++++++---
src/backend/replication/slot.c | 1 +
src/backend/replication/slotfuncs.c | 18 +++++++-
src/backend/utils/activity/pgstat_replslot.c | 4 +-
src/include/catalog/pg_proc.dat | 6 +--
src/include/replication/slot.h | 27 ++++++++++++
.../t/040_standby_failover_slots_sync.pl | 6 +++
src/test/regress/expected/rules.out | 5 ++-
src/tools/pgindent/typedefs.list | 1 +
12 files changed, 144 insertions(+), 16 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcc8474a7f7..e0556b6baac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1665,7 +1665,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Number of times the slot synchronization is skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
@@ -1677,7 +1677,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Time at which last slot synchronization was skipped. Slot
- synchronization occur only on standby servers and thus this column has
+ synchronization occurs only on standby servers and thus this column has
no meaning on the primary server.
</para>
</entry>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 0e623e7fb86..1fbf83079f0 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,48 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slotsync_skip_reason</structfield><type>text</type>
+ </para>
+ <para>
+ The reason for the last slot synchronization skip. Slot
+ synchronization occurs only on standby servers and thus this column has
+ no meaning on the primary server. It is relevant mainly for logical slots
+ on standby servers whose <structfield>synced</structfield> field is
+ <literal>true</literal>. It is <literal>NULL</literal> if slot
+ synchronization is successful.
+ Possible values are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>wal_or_rows_removed</literal> means that the required WALs or
+ catalog rows have already been removed or are at the risk of removal
+ from the standby.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>wal_not_flushed</literal> means that the standby had not
+ flushed the WAL corresponding to the position reserved on the failover
+ slot.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>no_consistent_snapshot</literal> means that the standby could
+ not build a consistent snapshot to decode WALs from
+ <structfield>restart_lsn</structfield>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>slot_invalidated</literal> means that the slot is invalidated.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6fffdb9398e..086c4c8fb6f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1060,7 +1060,8 @@ CREATE VIEW pg_replication_slots AS
L.conflicting,
L.invalidation_reason,
L.failover,
- L.synced
+ L.synced,
+ L.slotsync_skip_reason
FROM pg_get_replication_slots() AS L
LEFT JOIN pg_database D ON (L.datoid = D.oid);
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1f4f06d467b..53c7d629239 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -148,6 +148,35 @@ typedef struct RemoteSlot
static void slotsync_failure_callback(int code, Datum arg);
static void update_synced_slots_inactive_since(void);
+/*
+ * Update slot sync skip stats. This function requires the caller to acquire
+ * the slot.
+ */
+static void
+update_slotsync_skip_stats(SlotSyncSkipReason skip_reason)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
+
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslotsync(slot);
+
+ /* Update the slot sync skip reason */
+ if (slot->slotsync_skip_reason != skip_reason)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->slotsync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
/*
* If necessary, update the local synced slot's metadata based on the data
* from the remote slot.
@@ -170,6 +199,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
ReplicationSlot *slot = MyReplicationSlot;
bool updated_xmin_or_lsn = false;
bool updated_config = false;
+ SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
Assert(slot->data.invalidated == RS_INVAL_NONE);
@@ -188,7 +218,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
slot->data.catalog_xmin))
{
/* Update slot sync skip stats */
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_OR_ROWS_REMOVED);
/*
* This can happen in following situations:
@@ -286,12 +316,15 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
* persisted. See update_and_persist_local_synced_slot().
*/
if (found_consistent_snapshot && !(*found_consistent_snapshot))
- pgstat_report_replslotsync(slot);
+ skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
}
updated_xmin_or_lsn = true;
}
+ /* Update slot sync skip stats */
+ update_slotsync_skip_stats(skip_reason);
+
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
@@ -696,7 +729,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_INVALID);
ReplicationSlotRelease();
return slot_updated;
@@ -711,7 +744,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
@@ -812,7 +845,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
*/
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
- pgstat_report_replslotsync(slot);
+ update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
/*
* Can get here only if GUC 'synchronized_standby_slots' on the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1ec1e997b27..86ae99a3ca9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -491,6 +491,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
slot->last_saved_restart_lsn = InvalidXLogRecPtr;
slot->inactive_since = 0;
+ slot->slotsync_skip_reason = SS_SKIP_NONE;
/*
* Create the slot on disk. We haven't actually marked the slot allocated
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 0478fc9c977..7647f051581 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -24,6 +24,17 @@
#include "utils/guc.h"
#include "utils/pg_lsn.h"
+/*
+ * Map SlotSyncSkipReason enum values to human-readable names.
+ */
+static const char *SlotSyncSkipReasonNames[] = {
+ [SS_SKIP_NONE] = "none",
+ [SS_SKIP_WAL_NOT_FLUSHED] = "wal_not_flushed",
+ [SS_SKIP_WAL_OR_ROWS_REMOVED] = "wal_or_rows_removed",
+ [SS_SKIP_NO_CONSISTENT_SNAPSHOT] = "no_consistent_snapshot",
+ [SS_SKIP_INVALID] = "slot_invalidated"
+};
+
/*
* Helper function for creating a new physical replication slot with
* given arguments. Note that this function doesn't release the created
@@ -235,7 +246,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
Datum
pg_get_replication_slots(PG_FUNCTION_ARGS)
{
-#define PG_GET_REPLICATION_SLOTS_COLS 20
+#define PG_GET_REPLICATION_SLOTS_COLS 21
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
XLogRecPtr currlsn;
int slotno;
@@ -443,6 +454,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(slot_contents.data.synced);
+ if (slot_contents.slotsync_skip_reason == SS_SKIP_NONE)
+ nulls[i++] = true;
+ else
+ values[i++] = CStringGetTextDatum(SlotSyncSkipReasonNames[slot_contents.slotsync_skip_reason]);
+
Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index f93179146c2..e08d33e8b4c 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -115,8 +115,8 @@ pgstat_report_replslotsync(ReplicationSlot *slot)
PgStatShared_ReplSlot *shstatent;
PgStat_StatReplSlotEntry *statent;
- /* Slot sync stats are valid only for logical slots on standby. */
- Assert(SlotIsLogical(slot));
+ /* Slot sync stats are valid only for synced logical slots on standby. */
+ Assert(slot->data.synced);
Assert(RecoveryInProgress());
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66431940700..66af2d96d67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11519,9 +11519,9 @@
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
proretset => 't', provolatile => 's', prorettype => 'record',
proargtypes => '',
- proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
- proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
+ proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 09c69f83d57..01d949bb3c1 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -71,6 +71,23 @@ typedef enum ReplicationSlotInvalidationCause
/* Maximum number of invalidation causes */
#define RS_INVAL_MAX_CAUSES 4
+/*
+ * When slot sync worker is running or pg_sync_replication_slots is run, the
+ * slot sync can be skipped. This enum keeps a list of reasons of slot sync
+ * skip.
+ */
+typedef enum SlotSyncSkipReason
+{
+ SS_SKIP_NONE, /* No skip */
+ SS_SKIP_WAL_NOT_FLUSHED, /* Standby did not flush the wal corresponding
+ * to confirmed flush of remote slot */
+ SS_SKIP_WAL_OR_ROWS_REMOVED, /* Remote slot is behind; required WAL or
+ * rows may be removed or at risk */
+ SS_SKIP_NO_CONSISTENT_SNAPSHOT, /* Standby could not build a consistent
+ * snapshot */
+ SS_SKIP_INVALID /* Local slot is invalid */
+} SlotSyncSkipReason;
+
/*
* On-Disk data of a replication slot, preserved across restarts.
*/
@@ -249,6 +266,16 @@ typedef struct ReplicationSlot
*/
XLogRecPtr last_saved_restart_lsn;
+ /*
+ * The reason for last slot sync skip.
+ *
+ * A slotsync skip typically occurs only for temporary slots. For
+ * persistent slots it is extremely rare (e.g., cases like
+ * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_WAL_OR_ROWS_REMOVED). Since,
+ * temporary slots are dropped after server restart, so there is no value
+ * in persisting the slotsync_skip_reason.
+ */
+ SlotSyncSkipReason slotsync_skip_reason;
} ReplicationSlot;
#define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 7d3c82e0a29..25777fa188c 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1049,6 +1049,12 @@ $standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
$standby2->wait_for_log(
qr/could not synchronize replication slot \"lsub1_slot\"/, $log_offset);
+# Confirm that the slotsync skip reason is updated
+$result = $standby2->safe_psql('postgres',
+ "SELECT slotsync_skip_reason FROM pg_replication_slots WHERE slot_name = 'lsub1_slot'"
+);
+is($result, 'wal_or_rows_removed', "check slot sync skip reason");
+
# Confirm that the slotsync skip statistics is updated
$result = $standby2->safe_psql('postgres',
"SELECT slotsync_skip_count > 0 FROM pg_stat_replication_slots WHERE slot_name = 'lsub1_slot'"
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c337f0bc30d..94e45dd4d57 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1507,8 +1507,9 @@ pg_replication_slots| SELECT l.slot_name,
l.conflicting,
l.invalidation_reason,
l.failover,
- l.synced
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
+ l.synced,
+ l.slotsync_skip_reason
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3c3523b5b2..cf3f6a7dafd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2807,6 +2807,7 @@ SlabSlot
SlotInvalidationCauseMap
SlotNumber
SlotSyncCtxStruct
+SlotSyncSkipReason
SlruCtl
SlruCtlData
SlruErrorCause
--
2.34.1
Hi,
On Fri, Nov 21, 2025 at 11:00:36AM +0530, shveta malik wrote:
2) + s.slotsync_skip_count, + s.last_slotsync_skip_at,Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.
I am not sure that was a great change. AFAICT, we only have _at once in
the catalog, and that if for two_phase_at. There, AIUI, it marks the
specific timestamp two_phase was enabled(?) for logical replication, not
the last time something happened.
So I think using _at here as well is confusing as this one is about the
last time a slotsync was skipped. I think it should be renamed to
'last', as per our usual naming.
I agree that having the same prefix would be nice, but it looks like
almost all other columns are named last_something (except for
checksum_last_failure, but that's been around for a long time).
So I suggest to reopen the discussion about naming this (second) column.
Michael
Hi,
On Thu, Nov 27, 2025 at 11:08:00AM +0530, Shlok Kyal wrote:
I have addressed the comments and attached the updated patch.
Maybe it would be better to break out the slotsync parts of
pg_replication_slots into a new view? I think people are using
pg_replication_slots a lot for monitoring and keeping an eye on
replication and having it much wider makes it somewhat less useful.
Just a suggestion.
Micheal
On Fri, Nov 28, 2025 at 2:45 PM Michael Banck <mbanck@gmx.net> wrote:
Hi,
On Fri, Nov 21, 2025 at 11:00:36AM +0530, shveta malik wrote:
2) + s.slotsync_skip_count, + s.last_slotsync_skip_at,Shall we rename last_slotsync_skip_at to slotsync_last_skip_at. That
way all slotsync related stats columns will have same prefix.I am not sure that was a great change. AFAICT, we only have _at once in
the catalog, and that if for two_phase_at. There, AIUI, it marks the
specific timestamp two_phase was enabled(?) for logical replication, not
the last time something happened.So I think using _at here as well is confusing as this one is about the
last time a slotsync was skipped. I think it should be renamed to
'last', as per our usual naming.I agree that having the same prefix would be nice, but it looks like
almost all other columns are named last_something (except for
checksum_last_failure, but that's been around for a long time).
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time . The first two makes it easier to query similar
columns (slotsync related) together and the later two makes it similar
to existing ones in other views. The current one keeps the name
shorter and makes it easily queryable with other slotsync columns.
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: 69296849.170a0220.c3e1.ae45SMTPIN_ADDED_BROKEN@mx.google.com
Hi,
On Fri, Nov 28, 2025 at 03:30:48PM +0530, Amit Kapila wrote:
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time .
I also noticed while going through src/backend/catalog/system_views.sql
that *_last_*_time is rare, so in terms of brevity, removing the _time
at the end would be ok. "last_" already conveys time/a timestamp.
Michael
On Fri, Nov 28, 2025 at 3:37 PM Michael Banck <mbanck@gmx.net> wrote:
Hi,
On Fri, Nov 28, 2025 at 03:30:48PM +0530, Amit Kapila wrote:
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time .I also noticed while going through src/backend/catalog/system_views.sql
that *_last_*_time is rare, so in terms of brevity, removing the _time
at the end would be ok. "last_" already conveys time/a timestamp.
I think it depends on case to case but having last in the similar
cases seems to be a common practice. So, again thinking about it based
on your suggestion and looking at existing fields, I suggest we should
rename slotsync_skip_at to slotsync_last_skip. This is similar to
checksum_last_failure. I think there is a value in keeping initials
the same for similar fields in the view as users could easily identify
the related columns while querying the view. For example,
checksum_failures and checksum_last_failure in pg_stat_database.
Anyone else have any opinion on the names proposed here?
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: 69297463.170a0220.f6ffa.9483SMTPIN_ADDED_BROKEN@mx.google.com
On Wed, Dec 3, 2025 at 10:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 28, 2025 at 3:37 PM Michael Banck <mbanck@gmx.net> wrote:
Hi,
On Fri, Nov 28, 2025 at 03:30:48PM +0530, Amit Kapila wrote:
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time .I also noticed while going through src/backend/catalog/system_views.sql
that *_last_*_time is rare, so in terms of brevity, removing the _time
at the end would be ok. "last_" already conveys time/a timestamp.I think it depends on case to case but having last in the similar
cases seems to be a common practice. So, again thinking about it based
on your suggestion and looking at existing fields, I suggest we should
rename slotsync_skip_at to slotsync_last_skip. This is similar to
checksum_last_failure. I think there is a value in keeping initials
the same for similar fields in the view as users could easily identify
the related columns while querying the view. For example,
checksum_failures and checksum_last_failure in pg_stat_database.Anyone else have any opinion on the names proposed here?
IMHO keeping it 'slotsync_last_skip' makes more sense, so that we can
keep the *slotsync* prefix and the naming style also match with some
other usage e.g. 'checksum_last_failure' as well. I see there are
more common examples where the name starts with 'last_' but I prefer
the 'slotsync_last_skip' name.
--
Regards,
Dilip Kumar
Google
On Thu, 4 Dec 2025 at 08:54, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Dec 3, 2025 at 10:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 28, 2025 at 3:37 PM Michael Banck <mbanck@gmx.net> wrote:
Hi,
On Fri, Nov 28, 2025 at 03:30:48PM +0530, Amit Kapila wrote:
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time .I also noticed while going through src/backend/catalog/system_views.sql
that *_last_*_time is rare, so in terms of brevity, removing the _time
at the end would be ok. "last_" already conveys time/a timestamp.I think it depends on case to case but having last in the similar
cases seems to be a common practice. So, again thinking about it based
on your suggestion and looking at existing fields, I suggest we should
rename slotsync_skip_at to slotsync_last_skip. This is similar to
checksum_last_failure. I think there is a value in keeping initials
the same for similar fields in the view as users could easily identify
the related columns while querying the view. For example,
checksum_failures and checksum_last_failure in pg_stat_database.Anyone else have any opinion on the names proposed here?
IMHO keeping it 'slotsync_last_skip' makes more sense, so that we can
keep the *slotsync* prefix and the naming style also match with some
other usage e.g. 'checksum_last_failure' as well. I see there are
more common examples where the name starts with 'last_' but I prefer
the 'slotsync_last_skip' name.
I also agree with Amit and Dilip. I have made the change and attached
the patch for the same.
Thanks,
Shlok Kyal
Attachments:
v1-0001-Rename-column-slotsync_skip_at-to-slotsync_last_s.patchapplication/octet-stream; name=v1-0001-Rename-column-slotsync_skip_at-to-slotsync_last_s.patchDownload
From d6d5a31a8d1d6d2bf25ffd79bb351206da64bf2a Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 3 Dec 2025 14:27:51 +0530
Subject: [PATCH v1] Rename column slotsync_skip_at to slotsync_last_skip
This patch renames column slotsync_skip_at of view
pg_stat_replication_slots to slotsync_last_skip to make it consistent
with similar columns names.
---
contrib/test_decoding/expected/stats.out | 12 ++++++------
doc/src/sgml/monitoring.sgml | 2 +-
src/backend/catalog/system_views.sql | 2 +-
src/backend/utils/activity/pgstat_replslot.c | 2 +-
src/backend/utils/adt/pgstatfuncs.c | 6 +++---
src/include/catalog/pg_proc.dat | 2 +-
src/include/pgstat.h | 2 +-
src/test/regress/expected/rules.out | 4 ++--
8 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index e5117f88a14..a9ead3c41aa 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
-- verify accessing/resetting stats for non-existent slot does something reasonable
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_skip_at | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+------------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_last_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+--------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
SELECT pg_stat_reset_replication_slot('do-not-exist');
ERROR: replication slot "do-not-exist" does not exist
SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
- slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_skip_at | stats_reset
---------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+------------------+-------------
- do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
+ slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | mem_exceeded_count | total_txns | total_bytes | slotsync_skip_count | slotsync_last_skip | stats_reset
+--------------+------------+-------------+-------------+-------------+--------------+--------------+--------------------+------------+-------------+---------------------+--------------------+-------------
+ do-not-exist | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
(1 row)
-- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 039d73691be..d2dd5e28365 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1671,7 +1671,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<row>
<entry role="catalog_table_entry"><para role="column_definition">
- <structfield>slotsync_skip_at</structfield><type>timestamp with time zone</type>
+ <structfield>slotsync_last_skip</structfield><type>timestamp with time zone</type>
</para>
<para>
Time at which last slot synchronization was skipped. Slot
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 086c4c8fb6f..48af8ee90a6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1078,7 +1078,7 @@ CREATE VIEW pg_stat_replication_slots AS
s.total_txns,
s.total_bytes,
s.slotsync_skip_count,
- s.slotsync_skip_at,
+ s.slotsync_last_skip,
s.stats_reset
FROM pg_replication_slots as r,
LATERAL pg_stat_get_replication_slot(slot_name) as s
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index e08d33e8b4c..d757e00eb54 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -127,7 +127,7 @@ pgstat_report_replslotsync(ReplicationSlot *slot)
statent = &shstatent->stats;
statent->slotsync_skip_count += 1;
- statent->slotsync_skip_at = GetCurrentTimestamp();
+ statent->slotsync_last_skip = GetCurrentTimestamp();
pgstat_unlock_entry(entry_ref);
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7e2ed69138a..ef6fffe60b9 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2162,7 +2162,7 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 11, "slotsync_skip_count",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slotsync_skip_at",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "slotsync_last_skip",
TIMESTAMPTZOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
@@ -2192,10 +2192,10 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
values[9] = Int64GetDatum(slotent->total_bytes);
values[10] = Int64GetDatum(slotent->slotsync_skip_count);
- if (slotent->slotsync_skip_at == 0)
+ if (slotent->slotsync_last_skip == 0)
nulls[11] = true;
else
- values[11] = TimestampTzGetDatum(slotent->slotsync_skip_at);
+ values[11] = TimestampTzGetDatum(slotent->slotsync_last_skip);
if (slotent->stat_reset_timestamp == 0)
nulls[12] = true;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66af2d96d67..fd9448ec7b9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5693,7 +5693,7 @@
proparallel => 'r', prorettype => 'record', proargtypes => 'text',
proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz,timestamptz}',
proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slotsync_skip_count,slotsync_skip_at,stats_reset}',
+ proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,mem_exceeded_count,total_txns,total_bytes,slotsync_skip_count,slotsync_last_skip,stats_reset}',
prosrc => 'pg_stat_get_replication_slot' },
{ oid => '6230', descr => 'statistics: check if a stats object exists',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ad85134f27a..f23dd5870da 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -401,7 +401,7 @@ typedef struct PgStat_StatReplSlotEntry
PgStat_Counter total_txns;
PgStat_Counter total_bytes;
PgStat_Counter slotsync_skip_count;
- TimestampTz slotsync_skip_at;
+ TimestampTz slotsync_last_skip;
TimestampTz stat_reset_timestamp;
} PgStat_StatReplSlotEntry;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 94e45dd4d57..85d795dbd63 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2153,10 +2153,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
s.total_txns,
s.total_bytes,
s.slotsync_skip_count,
- s.slotsync_skip_at,
+ s.slotsync_last_skip,
s.stats_reset
FROM pg_replication_slots r,
- LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slotsync_skip_count, slotsync_skip_at, stats_reset)
+ LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, mem_exceeded_count, total_txns, total_bytes, slotsync_skip_count, slotsync_last_skip, stats_reset)
WHERE (r.datoid IS NOT NULL);
pg_stat_slru| SELECT name,
blks_zeroed,
--
2.34.1
Hi,
On Thu, Dec 04, 2025 at 12:06:44PM +0530, Shlok Kyal wrote:
On Thu, 4 Dec 2025 at 08:54, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Dec 3, 2025 at 10:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 28, 2025 at 3:37 PM Michael Banck <mbanck@gmx.net> wrote:
On Fri, Nov 28, 2025 at 03:30:48PM +0530, Amit Kapila wrote:
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time .I also noticed while going through src/backend/catalog/system_views.sql
that *_last_*_time is rare, so in terms of brevity, removing the _time
at the end would be ok. "last_" already conveys time/a timestamp.I think it depends on case to case but having last in the similar
cases seems to be a common practice. So, again thinking about it based
on your suggestion and looking at existing fields, I suggest we should
rename slotsync_skip_at to slotsync_last_skip. This is similar to
checksum_last_failure. I think there is a value in keeping initials
the same for similar fields in the view as users could easily identify
the related columns while querying the view. For example,
checksum_failures and checksum_last_failure in pg_stat_database.Anyone else have any opinion on the names proposed here?
IMHO keeping it 'slotsync_last_skip' makes more sense, so that we can
keep the *slotsync* prefix and the naming style also match with some
other usage e.g. 'checksum_last_failure' as well. I see there are
more common examples where the name starts with 'last_' but I prefer
the 'slotsync_last_skip' name.I also agree with Amit and Dilip. I have made the change and attached
the patch for the same.
LGTM.
Michael
On Thu, Dec 4, 2025 at 5:59 PM Michael Banck <mbanck@gmx.net> wrote:
On Thu, Dec 04, 2025 at 12:06:44PM +0530, Shlok Kyal wrote:
On Thu, 4 Dec 2025 at 08:54, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Dec 3, 2025 at 10:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 28, 2025 at 3:37 PM Michael Banck <mbanck@gmx.net> wrote:
On Fri, Nov 28, 2025 at 03:30:48PM +0530, Amit Kapila wrote:
I suggested the current one because having the last was making the
column name bit longer, and anyway the description clarifies it, but I
see your point. So, the other options could be
slotsync_last_skip_time, sync_last_skip_time, last_slotsync_skip_time,
last_sync_skip_time .I also noticed while going through src/backend/catalog/system_views.sql
that *_last_*_time is rare, so in terms of brevity, removing the _time
at the end would be ok. "last_" already conveys time/a timestamp.I think it depends on case to case but having last in the similar
cases seems to be a common practice. So, again thinking about it based
on your suggestion and looking at existing fields, I suggest we should
rename slotsync_skip_at to slotsync_last_skip. This is similar to
checksum_last_failure. I think there is a value in keeping initials
the same for similar fields in the view as users could easily identify
the related columns while querying the view. For example,
checksum_failures and checksum_last_failure in pg_stat_database.Anyone else have any opinion on the names proposed here?
IMHO keeping it 'slotsync_last_skip' makes more sense, so that we can
keep the *slotsync* prefix and the naming style also match with some
other usage e.g. 'checksum_last_failure' as well. I see there are
more common examples where the name starts with 'last_' but I prefer
the 'slotsync_last_skip' name.I also agree with Amit and Dilip. I have made the change and attached
the patch for the same.LGTM.
Pushed.
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: 69317ec1.050a0220.210947.6d40SMTPIN_ADDED_BROKEN@mx.google.com
On Fri, Nov 28, 2025 at 2:59 PM Michael Banck <mbanck@gmx.net> wrote:
On Thu, Nov 27, 2025 at 11:08:00AM +0530, Shlok Kyal wrote:
I have addressed the comments and attached the updated patch.
Maybe it would be better to break out the slotsync parts of
pg_replication_slots into a new view?
We have another view pg_stat_replication_slots, for the statistics of
slots, yet another might not make sense at this stage. If we decide to
have more statistics/monitoring columns for slots that require some
separation, then we can consider moving this one as well.
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: 69296b67.170a0220.31f032.c5edSMTPIN_ADDED_BROKEN@mx.google.com