Introduce XID age based replication slot invalidation

Started by John H4 months ago4 messages

johnhyvr@gmail.com

4 months ago

1 attachment(s)

Hi folks,

I'd like to restart the discussion about providing an xid-based slot
invalidation mechanism. The previous effort [1]/messages/by-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com presented an XID and
time-based invalidation and the inactive time-based approach was
implemented first. The latest XID based patch from Bharath Rupireddy
can be found here [2]/messages/by-id/CALj2ACXe8+xSNdMXTMaSRWUwX7v61Ad4iddUwnn=djSwx3GLLg@mail.gmail.com.

When thinking about availability of the database, inactive replication
slots cause two main pain points:
1) WAL accumulation
2) Replication slots with xmin/catalog_xmin can hold back vacuuming
leading to wrap-around

The first issue can be mitigated by 'max_slot_wal_keep_size'. However
in the second case there are no good mechanisms to prioritize write
availability of the database and avoid wraparound. The new GUC
'idle_replication_slot_timeout' partially addresses the concern if you
have similar workloads. However it's hard to set the same setting
across a fleet of different applications.

It's easy to imagine a high-XID churning workload in one cluster while
another has large batch jobs where changes get synced out
periodically. There isn't a "one-size" fits all setting for
'idle_replication_slot_timeout' in these two cases.

The attached patch addresses this by introducing 'max_slot_xid_age' in
a similar fashion. Replication slots with transaction ID greater than
the set age will get invalidated allowing vacuum to proceed, biasing
towards database availability.

Invalidation happens in CHECKPOINT, similar to
'idle_replication_slot_timeout', and when VACUUM occurs.

The patch currently attempts to invalidate once-per-autovacuum worker.
We're wondering if it should attempt invalidation on a per-relation
basis within the vacuum call itself. That would account for scenarios
where the cost_delay or naptime is high between autovac executions.

Thanks,

John H

[1]: /messages/by-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
[2]: /messages/by-id/CALj2ACXe8+xSNdMXTMaSRWUwX7v61Ad4iddUwnn=djSwx3GLLg@mail.gmail.com

--
John Hsu - Amazon Web Services

Attachments:

0044-Add-XID-age-based-replication-slot-invalidation.patchapplication/octet-stream; name=0044-Add-XID-age-based-replication-slot-invalidation.patchDownload

From cd9cb104041800810d38e21e31d207311f112228 Mon Sep 17 00:00:00 2001
From: John Hsu <johnhyvr@gmail.com>
Date: Fri, 8 Aug 2025 19:48:58 +0000
Subject: [PATCH] Add XID age based replication slot invalidation

This commit introduces max_slot_xid_age GUC
that allows replication slots whose xmin or
catalog_xmin has reached the age specified by
this setting to be invalidated.

Idle or forgotten replication slots can hold
back vacuum operations leading to bloat or
transaction XID wrap around, requiring the slot
to be dropped and requiring single user-mode
vacuuming. This setting avoids these scenarios
by proactively invalidating these stale slots
on an XID basis.

Invalidation checks happens at various locations
to prevent wrap-around:
- During CHECKPOINT
- During vacuum (including autovacuum)

This change makes it easy for administrators to
protect against wrap-around concerns due to slots
that are falling behind.

Author: Bharath Rupireddy
Author: John Hsu
---
 doc/src/sgml/config.sgml                      |  26 ++
 doc/src/sgml/system-views.sgml                |   8 +
 src/backend/access/transam/xlog.c             |   4 +-
 src/backend/commands/vacuum.c                 |  70 ++++-
 src/backend/replication/slot.c                |  80 +++++-
 src/backend/utils/misc/guc_parameters.dat     |   8 +
 src/include/replication/slot.h                |   7 +-
 .../t/049_invalidate_xid_aged_slots.pl        | 240 ++++++++++++++++++
 8 files changed, 436 insertions(+), 7 deletions(-)
 create mode 100644 src/test/recovery/t/049_invalidate_xid_aged_slots.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e9b420f3ddb..eb07bcd3551 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4653,6 +4653,32 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-slot-xid-age" xreflabel="max_slot_xid_age">
+      <term><varname>max_slot_xid_age</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_slot_xid_age</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Invalidate replication slots whose <literal>xmin</literal> (the oldest
+        transaction that this slot needs the database to retain) or
+        <literal>catalog_xmin</literal> (the oldest transaction affecting the
+        system catalogs that this slot needs the database to retain) has reached
+        the age specified by this setting. A value of zero (which is default)
+        disables this feature. Users can set this value anywhere from zero to
+        two billion. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command
+        line.
+       </para>
+
+       <para>
+        This invalidation check happens either when the slot is acquired
+        for use or during vacuum or during checkpoint.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-wal-sender-timeout" xreflabel="wal_sender_timeout">
       <term><varname>wal_sender_timeout</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 4187191ea74..beeb22a7da4 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3007,6 +3007,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
           <xref linkend="guc-idle-replication-slot-timeout"/> duration.
          </para>
         </listitem>
+        <listitem>
+         <para>
+          <literal>xid_aged</literal> means that the slot's
+          <literal>xmin</literal> or <literal>catalog_xmin</literal>
+          has reached the age specified by
+          <xref linkend="guc-max-slot-xid-age"/> parameter.
+         </para>
+        </listitem>
        </itemizedlist>
       </para></entry>
      </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0baf0ac6160..41a48389afb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7347,7 +7347,7 @@ CreateCheckPoint(int flags)
 	 */
 	XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
 	KeepLogSeg(recptr, &_logSegNo);
-	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
+	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT | RS_INVAL_XID_AGE,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
@@ -7801,7 +7801,7 @@ CreateRestartPoint(int flags)
 	replayPtr = GetXLogReplayRecPtr(&replayTLI);
 	endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
 	KeepLogSeg(endptr, &_logSegNo);
-	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
+	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT | RS_INVAL_XID_AGE,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..91cc08069c8 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -47,6 +47,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
+#include "replication/slot.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/pmsignal.h"
@@ -129,6 +130,7 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static void try_replication_slot_invalidation(void);
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -471,6 +473,56 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	MemoryContextDelete(vac_context);
 }
 
+/*
+ * Try invalidating replication slots based on current replication slot xmin
+ * limits once every vacuum cycle.
+ */
+static void
+try_replication_slot_invalidation(void)
+{
+
+	TransactionId min_slot_xmin;
+	TransactionId min_slot_catalog_xmin;
+	bool		can_invalidate = false;
+	TransactionId cutoff;
+	TransactionId curr;
+
+	if (max_slot_xid_age == 0)
+		return;
+
+	curr = ReadNextTransactionId();
+
+	/*
+	 * Calculate oldest XID a slot's xmin or catalog_xmin can have before
+	 * they are invalidated.
+	 */
+	cutoff = curr - max_slot_xid_age;
+
+	if (!TransactionIdIsNormal(cutoff))
+		cutoff = FirstNormalTransactionId;
+
+	ProcArrayGetReplicationSlotXmin(&min_slot_xmin, &min_slot_catalog_xmin);
+
+	if (TransactionIdIsNormal(min_slot_xmin) &&
+		TransactionIdPrecedesOrEquals(min_slot_xmin, cutoff))
+		can_invalidate = true;
+	else if (TransactionIdIsNormal(min_slot_catalog_xmin) &&
+			 TransactionIdPrecedesOrEquals(min_slot_catalog_xmin, cutoff))
+		can_invalidate = true;
+
+	if (can_invalidate)
+	{
+		/*
+		 * Note that InvalidateObsoleteReplicationSlots is also called as part
+		 * of CHECKPOINT, and emitting ERRORs from within is avoided already.
+		 * Therefore, there is no concern here that any ERROR from
+		 * invalidating replication slots blocks VACUUM.
+		 */
+		InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE, 0,
+										   InvalidOid, InvalidTransactionId);
+	}
+}
+
 /*
  * Internal entry point for autovacuum and the VACUUM / ANALYZE commands.
  *
@@ -498,7 +550,7 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 	   MemoryContext vac_context, bool isTopLevel)
 {
 	static bool in_vacuum = false;
-
+	static bool first_time = true;
 	const char *stmttype;
 	volatile bool in_outer_xact,
 				use_own_xacts;
@@ -611,6 +663,22 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 		CommitTransactionCommand();
 	}
 
+	if (params.options & VACOPT_VACUUM)
+	{
+		if (first_time)
+			try_replication_slot_invalidation();
+
+		/*
+		 * Every autovacuum worker will attempt to invalidate replication slots once.
+		 * If a replication slot exceeds the age specified by max_slot_xid_age then
+		 * it will only be invalidated once the next worker attempt kicks off. For
+		 * manual VACUUM always attempt invalidation to account for long-lived
+		 * maintenance connections.
+		 */
+		if (AmAutoVacuumWorkerProcess())
+			first_time = false;
+	}
+
 	/* Turn vacuum cost accounting on or off, and set/clear in_vacuum */
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fd0fdb96d42..b4fc0ba3702 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -116,6 +116,7 @@ static const SlotInvalidationCauseMap SlotInvalidationCauses[] = {
 	{RS_INVAL_HORIZON, "rows_removed"},
 	{RS_INVAL_WAL_LEVEL, "wal_level_insufficient"},
 	{RS_INVAL_IDLE_TIMEOUT, "idle_timeout"},
+	{RS_INVAL_XID_AGE, "xid_aged"},
 };
 
 /*
@@ -157,6 +158,12 @@ int			max_replication_slots = 10; /* the maximum number of replication
  */
 int			idle_replication_slot_timeout_secs = 0;
 
+/*
+ * Invalidate replication slots that have xmin or catalog_xmin greater
+ * than the specified age; '0' disables it.
+ */
+int			max_slot_xid_age = 0;
+
 /*
  * This GUC lists streaming replication standby server slot names that
  * logical WAL sender processes will wait for.
@@ -1620,7 +1627,9 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 					   XLogRecPtr restart_lsn,
 					   XLogRecPtr oldestLSN,
 					   TransactionId snapshotConflictHorizon,
-					   long slot_idle_seconds)
+					   long slot_idle_seconds,
+					   TransactionId xmin,
+					   TransactionId catalog_xmin)
 {
 	StringInfoData err_detail;
 	StringInfoData err_hint;
@@ -1665,6 +1674,26 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 								 "idle_replication_slot_timeout");
 				break;
 			}
+
+		case RS_INVAL_XID_AGE:
+			{
+				Assert(TransactionIdIsValid(xmin) || TransactionIdIsValid(catalog_xmin));
+
+				if (TransactionIdIsValid(xmin))
+					appendStringInfo(&err_detail, _("The slot's xmin %u exceeds the maximum xid age %d specified by \"max_slot_xid_age\"."),
+								 xmin,
+								 max_slot_xid_age);
+				else
+					appendStringInfo(&err_detail, _("The slot's catalog_xmin %u exceeds the maximum xid age %d specified by \"max_slot_xid_age\"."),
+								 catalog_xmin,
+								 max_slot_xid_age);
+
+				appendStringInfo(&err_hint, _("You might need to increase \"%s\"."),
+								 "max_slot_xid_age");
+
+
+				break;
+			}
 		case RS_INVAL_NONE:
 			pg_unreachable();
 	}
@@ -1783,6 +1812,13 @@ DetermineSlotInvalidationCause(uint32 possible_causes, ReplicationSlot *s,
 		}
 	}
 
+	if (possible_causes & RS_INVAL_XID_AGE)
+	{
+		/* Safe since we hold the replication slot's spinlock needed to avoid race conditions */
+		if (ReplicationSlotIsXIDAged(s->data.xmin, s->data.catalog_xmin))
+			return RS_INVAL_XID_AGE;
+	}
+
 	return RS_INVAL_NONE;
 }
 
@@ -1972,7 +2008,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 				ReportSlotInvalidation(invalidation_cause, true, active_pid,
 									   slotname, restart_lsn,
 									   oldestLSN, snapshotConflictHorizon,
-									   slot_idle_secs);
+									   slot_idle_secs, s->data.xmin, s->data.catalog_xmin);
 
 				if (MyBackendType == B_STARTUP)
 					(void) SendProcSignal(active_pid,
@@ -2019,7 +2055,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 			ReportSlotInvalidation(invalidation_cause, false, active_pid,
 								   slotname, restart_lsn,
 								   oldestLSN, snapshotConflictHorizon,
-								   slot_idle_secs);
+								   slot_idle_secs, s->data.xmin, s->data.catalog_xmin);
 
 			/* done with this slot for now */
 			break;
@@ -2044,6 +2080,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
  * - RS_INVAL_WAL_LEVEL: is logical and wal_level is insufficient
  * - RS_INVAL_IDLE_TIMEOUT: has been idle longer than the configured
  *   "idle_replication_slot_timeout" duration.
+ * - RS_INVAL_XID_AGE: slot xid age is older than the configured
+ *   "max_slot_xid_age" age
  *
  * Note: This function attempts to invalidate the slot for multiple possible
  * causes in a single pass, minimizing redundant iterations. The "cause"
@@ -3093,3 +3131,39 @@ WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
 
 	ConditionVariableCancelSleep();
 }
+
+/*
+ * Check true if the given passed in xmin or catalog_xmin age is
+ * older than the age specified by max_slot_xid_age.
+ */
+bool
+ReplicationSlotIsXIDAged(TransactionId xmin, TransactionId catalog_xmin)
+{
+	TransactionId cutoff;
+	TransactionId curr;
+	bool is_aged = false;
+
+	if (max_slot_xid_age == 0)
+		return false;
+
+	curr = ReadNextTransactionId();
+
+	/*
+	 * Calculate oldest XID a slot's xmin or catalog_xmin can have before
+	 * they are invalidated.
+	 */
+	cutoff = curr - max_slot_xid_age;
+
+	if (!TransactionIdIsNormal(cutoff))
+		cutoff = FirstNormalTransactionId;
+
+	if (TransactionIdIsNormal(xmin) &&
+		TransactionIdPrecedesOrEquals(xmin, cutoff))
+		is_aged = true;
+
+	if (TransactionIdIsNormal(catalog_xmin) &&
+		TransactionIdPrecedesOrEquals(catalog_xmin, cutoff))
+		is_aged = true;
+
+	return is_aged;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..c162268c314 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1717,6 +1717,14 @@
   max => 'INT_MAX',
 },
 
+{ name => 'max_slot_xid_age', type => 'int', context => 'PGC_SIGHUP', group => 'REPLICATION_SENDING',
+  short_desc => 'Age of the transaction ID at which a replication slot gets invalidated.',
+  variable => 'max_slot_xid_age',
+  boot_val => '0',
+  min => '0',
+  max => '2000000000',
+},
+
 # we have no microseconds designation, so can't supply units here
 { name => 'commit_delay', type => 'int', context => 'PGC_SUSET', group => 'WAL_SETTINGS',
   short_desc => 'Sets the delay in microseconds between transaction commit and flushing WAL to disk.',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index fe62162cde3..3253f108ffc 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -66,10 +66,12 @@ typedef enum ReplicationSlotInvalidationCause
 	RS_INVAL_WAL_LEVEL = (1 << 2),
 	/* idle slot timeout has occurred */
 	RS_INVAL_IDLE_TIMEOUT = (1 << 3),
+	/* slot's xmin or catalog_xmin has reached max xid age */
+	RS_INVAL_XID_AGE = (1 << 4),
 } ReplicationSlotInvalidationCause;
 
 /* Maximum number of invalidation causes */
-#define	RS_INVAL_MAX_CAUSES 4
+#define	RS_INVAL_MAX_CAUSES 5
 
 /*
  * On-Disk data of a replication slot, preserved across restarts.
@@ -293,6 +295,7 @@ extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot;
 extern PGDLLIMPORT int max_replication_slots;
 extern PGDLLIMPORT char *synchronized_standby_slots;
 extern PGDLLIMPORT int idle_replication_slot_timeout_secs;
+extern PGDLLIMPORT int max_slot_xid_age;
 
 /* shmem initialization functions */
 extern Size ReplicationSlotsShmemSize(void);
@@ -350,4 +353,6 @@ extern bool SlotExistsInSyncStandbySlots(const char *slot_name);
 extern bool StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel);
 extern void WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn);
 
+extern bool ReplicationSlotIsXIDAged(TransactionId xmin, TransactionId catalog_xmin);
+
 #endif							/* SLOT_H */
diff --git a/src/test/recovery/t/049_invalidate_xid_aged_slots.pl b/src/test/recovery/t/049_invalidate_xid_aged_slots.pl
new file mode 100644
index 00000000000..f1e8f003f95
--- /dev/null
+++ b/src/test/recovery/t/049_invalidate_xid_aged_slots.pl
@@ -0,0 +1,240 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test for replication slots invalidation due to XID age
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::BackgroundPsql;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+# Wait for slot to first become inactive and then get invalidated
+sub wait_for_slot_invalidation
+{
+	my ($node, $slot_name, $reason) = @_;
+	my $name = $node->name;
+
+	# Wait for the inactive replication slot to be invalidated
+	$node->poll_query_until(
+		'postgres', qq[
+		SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+			WHERE slot_name = '$slot_name' AND
+			invalidation_reason = '$reason';
+	])
+	  or die
+	  "Timed out while waiting for inactive slot $slot_name to be invalidated on node $name";
+}
+
+# Do some work for advancing xids on a given node
+sub advance_xids
+{
+	my ($node, $table_name) = @_;
+
+	$node->safe_psql(
+		'postgres', qq[
+		do \$\$
+		begin
+		for i in 10000..11000 loop
+			-- use an exception block so that each iteration eats an XID
+			begin
+			insert into $table_name values (i);
+			exception
+			when division_by_zero then null;
+			end;
+		end loop;
+		end\$\$;
+	]);
+}
+
+# =============================================================================
+# Testcase start: Invalidate streaming standby's slot due to max_slot_xid_age
+# GUC.
+
+# Initialize primary node
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 'logical');
+
+# Configure primary with XID age settings
+$primary->append_conf(
+	'postgresql.conf', qq{
+max_slot_xid_age = 500
+});
+
+$primary->start;
+
+# Take a backup for creating standby
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Create a standby linking to the primary using the replication slot
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+# Enable hs_feedback. The slot should gain an xmin. We set the status interval
+# so we'll see the results promptly.
+$standby->append_conf(
+	'postgresql.conf', q{
+primary_slot_name = 'sb_slot'
+hot_standby_feedback = on
+wal_receiver_status_interval = 1
+max_standby_streaming_delay = 3600000
+});
+
+$primary->safe_psql(
+	'postgres', qq[
+    SELECT pg_create_physical_replication_slot(slot_name := 'sb_slot', immediately_reserve := true);
+]);
+
+$standby->start;
+
+# Create some content on primary to move xmin
+$primary->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1,10) AS a");
+
+# Wait until standby has replayed enough data
+$primary->wait_for_catchup($standby);
+
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT (xmin IS NOT NULL) OR (catalog_xmin IS NOT NULL)
+		FROM pg_catalog.pg_replication_slots
+		WHERE slot_name = 'sb_slot';
+]) or die "Timed out waiting for slot sb_slot xmin to advance";
+
+# Stop standby to make the replication slot's xmin on primary to age
+
+# Read on standby that causes xmin to be held on slot
+my $standby_session = $standby->interactive_psql('postgres');
+$standby_session->query("BEGIN; SET default_transaction_isolation = 'repeatable read'; SELECT * FROM tab_int;");
+
+#$standby->stop;
+
+# Do some work to advance xids on primary
+advance_xids($primary, 'tab_int');
+
+# Wait for the replication slot to become inactive and then invalidated due to
+# XID age.
+$primary->safe_psql('postgres', "CHECKPOINT");
+wait_for_slot_invalidation($primary, 'sb_slot', 'xid_aged');
+
+$standby_session->quit;
+$standby->stop;
+
+# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
+# GUC.
+# =============================================================================
+
+# =============================================================================
+# Testcase start: Invalidate logical subscriber's slot due to max_slot_xid_age
+# GUC.
+
+# Create a subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+# Create tables on both primary and subscriber
+$primary->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some initial data
+$primary->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $primary->connstr . ' dbname=postgres';
+$primary->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR TABLE test_tbl");
+
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub WITH (slot_name = 'lsub_slot')"
+);
+
+# Wait for initial sync to complete
+$subscriber->wait_for_subscription_sync($primary, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+is($result, qq(5), "check initial copy was done for logical replication");
+
+# Wait for the logical slot to get catalog_xmin (logical slots use catalog_xmin, not xmin)
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT xmin IS NULL AND catalog_xmin IS NOT NULL
+	FROM pg_catalog.pg_replication_slots
+	WHERE slot_name = 'lsub_slot';
+]) or die "Timed out waiting for slot lsub_slot catalog_xmin to advance";
+
+# Stop subscriber to make the replication slot on primary inactive
+$subscriber->stop;
+
+# Do some work to advance xids on primary
+advance_xids($primary, 'test_tbl');
+
+# Wait for the replication slot to become inactive and then invalidated due to
+# XID age.
+$primary->safe_psql('postgres', "CHECKPOINT");
+wait_for_slot_invalidation($primary, 'lsub_slot', 'xid_aged');
+
+# Testcase end: Invalidate logical subscriber's slot due to max_slot_xid_age
+# GUC.
+# =============================================================================
+
+# =============================================================================
+# Testcase start: Test VACUUM command triggering slot invalidation
+#
+
+# Create another physical replication slot for VACUUM test
+$primary->safe_psql(
+	'postgres', qq[
+    SELECT pg_create_physical_replication_slot(slot_name := 'vacuum_test_slot', immediately_reserve := true);
+]);
+
+# Create a new standby for this test
+my $standby_vacuum = PostgreSQL::Test::Cluster->new('standby_vacuum');
+$standby_vacuum->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+$standby_vacuum->append_conf(
+	'postgresql.conf', q{
+primary_slot_name = 'vacuum_test_slot'
+hot_standby_feedback = on
+wal_receiver_status_interval = 1
+});
+
+$standby_vacuum->start;
+
+# Wait until standby has replayed enough data and slot gets xmin
+$primary->wait_for_catchup($standby_vacuum);
+
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT (xmin IS NOT NULL) OR (catalog_xmin IS NOT NULL)
+		FROM pg_catalog.pg_replication_slots
+		WHERE slot_name = 'vacuum_test_slot';
+]) or die "Timed out waiting for slot vacuum_test_slot xmin to advance";
+
+# Stop standby to make the replication slot's xmin on primary to age
+$standby_vacuum->stop;
+
+# Do some work to advance xids on primary
+advance_xids($primary, 'tab_int');
+
+# Use VACUUM to trigger slot invalidation (instead of CHECKPOINT)
+# This tests that VACUUM command can trigger XID age invalidation
+$primary->safe_psql('postgres', "VACUUM");
+
+# Wait for the replication slot to become invalidated due to XID age triggered by VACUUM
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+		WHERE slot_name = 'vacuum_test_slot' AND
+		invalidation_reason = 'xid_aged';
+])
+  or die "Timed out while waiting for slot vacuum_test_slot to be invalidated by VACUUM";
+
+# Testcase end: Test VACUUM command triggering slot invalidation
+# =============================================================================
+
+ok(1, "all XID age invalidation tests completed successfully");
+
+done_testing();
-- 
2.50.1

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

4 months ago

In reply to: John H (#1)

RE: Introduce XID age based replication slot invalidation

Dear John,

The first issue can be mitigated by 'max_slot_wal_keep_size'. However
in the second case there are no good mechanisms to prioritize write
availability of the database and avoid wraparound. The new GUC
'idle_replication_slot_timeout' partially addresses the concern if you
have similar workloads. However it's hard to set the same setting
across a fleet of different applications.

IIUC, the feature can directly avoid the wraparound issue than other
invalidation mechanism. The motivation seems enough for me.

The patch currently attempts to invalidate once-per-autovacuum worker.
We're wondering if it should attempt invalidation on a per-relation
basis within the vacuum call itself. That would account for scenarios
where the cost_delay or naptime is high between autovac executions.

I have a concern that age calculation acquire the lock for XidGenLock thus
performance can be affected. Do you have insights for it?

Invalidation happens in CHECKPOINT, similar to
'idle_replication_slot_timeout', and when VACUUM occurs.

Let me confirm because I'm new. VACUUM can also trigger because old XID make
VACUUM fail, right? Timeout is aimed for WAL thus it is not so related with VACUUM,
which does not recycle segments.

In contrast, is there a possibility that XID-age check can be done only at VACUUM?

Regarding the patch, try_replication_slot_invalidation() and ReplicationSlotIsXIDAged()
do the same task. Can we reduce duplicated part?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

John H

johnhyvr@gmail.com

4 months ago

In reply to: Hayato Kuroda (Fujitsu) (#2)

1 attachment(s)

Re: Introduce XID age based replication slot invalidation

Hi Hayato,

Thank you for taking a look.

The patch currently attempts to invalidate once-per-autovacuum worker.
We're wondering if it should attempt invalidation on a per-relation
basis within the vacuum call itself. That would account for scenarios
where the cost_delay or naptime is high between autovac executions.

I have a concern that age calculation acquire the lock for XidGenLock thus
performance can be affected. Do you have insights for it?

Are you concerned if we did the check on a per table case? Or in the
current situation
where it's only once per-worker.

Invalidation happens in CHECKPOINT, similar to
'idle_replication_slot_timeout', and when VACUUM occurs.

Let me confirm because I'm new. VACUUM can also trigger because old XID make
VACUUM fail, right? Timeout is aimed for WAL thus it is not so related with VACUUM,
which does not recycle segments.

I feel that the timeout is used as a way to roughly address storage
accumulation or VACUUM
not progressing due to slots.

In contrast, is there a possibility that XID-age check can be done only at VACUUM?

It's also done in CHECKPOINT because there can be stale replication
slots on standby that
aren't there on writer. We would still want them to be invalidated.

Regarding the patch, try_replication_slot_invalidation() and ReplicationSlotIsXIDAged()
do the same task. Can we reduce duplicated part?

Thanks for catching, I thought I did this but guess not. Updated in
the latest attachment.

--
John Hsu - Amazon Web Services

Attachments:

0045-Add-XID-age-based-replication-slot-invalidation.patchapplication/octet-stream; name=0045-Add-XID-age-based-replication-slot-invalidation.patchDownload

From b9c965a6459bfced99f59a60fef2897abe282159 Mon Sep 17 00:00:00 2001
From: John Hsu <johnhyvr@gmail.com>
Date: Fri, 8 Aug 2025 19:48:58 +0000
Subject: [PATCH] Add XID age based replication slot invalidation

This commit introduces max_slot_xid_age GUC
that allows replication slots whose xmin or
catalog_xmin has reached the age specified by
this setting to be invalidated.

Idle or forgotten replication slots can hold
back vacuum operations leading to bloat or
transaction XID wrap around, requiring the slot
to be dropped and requiring single user-mode
vacuuming. This setting avoids these scenarios
by proactively invalidating these stale slots
on an XID basis.

Invalidation checks happens at various locations
to prevent wrap-around:
- During CHECKPOINT
- During vacuum (including autovacuum)

This change makes it easy for administrators to
protect against wrap-around concerns due to slots
that are falling behind.

Author: Bharath Rupireddy
Author: John Hsu
---
 doc/src/sgml/config.sgml                      |  26 ++
 doc/src/sgml/system-views.sgml                |   8 +
 src/backend/access/transam/xlog.c             |   4 +-
 src/backend/commands/vacuum.c                 |  45 +++-
 src/backend/replication/slot.c                |  80 +++++-
 src/backend/utils/misc/guc_parameters.dat     |   8 +
 src/include/replication/slot.h                |   7 +-
 .../t/049_invalidate_xid_aged_slots.pl        | 240 ++++++++++++++++++
 8 files changed, 411 insertions(+), 7 deletions(-)
 create mode 100644 src/test/recovery/t/049_invalidate_xid_aged_slots.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e9b420f3ddb..eb07bcd3551 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4653,6 +4653,32 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-slot-xid-age" xreflabel="max_slot_xid_age">
+      <term><varname>max_slot_xid_age</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_slot_xid_age</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Invalidate replication slots whose <literal>xmin</literal> (the oldest
+        transaction that this slot needs the database to retain) or
+        <literal>catalog_xmin</literal> (the oldest transaction affecting the
+        system catalogs that this slot needs the database to retain) has reached
+        the age specified by this setting. A value of zero (which is default)
+        disables this feature. Users can set this value anywhere from zero to
+        two billion. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command
+        line.
+       </para>
+
+       <para>
+        This invalidation check happens either when the slot is acquired
+        for use or during vacuum or during checkpoint.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-wal-sender-timeout" xreflabel="wal_sender_timeout">
       <term><varname>wal_sender_timeout</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 4187191ea74..beeb22a7da4 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3007,6 +3007,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
           <xref linkend="guc-idle-replication-slot-timeout"/> duration.
          </para>
         </listitem>
+        <listitem>
+         <para>
+          <literal>xid_aged</literal> means that the slot's
+          <literal>xmin</literal> or <literal>catalog_xmin</literal>
+          has reached the age specified by
+          <xref linkend="guc-max-slot-xid-age"/> parameter.
+         </para>
+        </listitem>
        </itemizedlist>
       </para></entry>
      </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0baf0ac6160..41a48389afb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7347,7 +7347,7 @@ CreateCheckPoint(int flags)
 	 */
 	XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
 	KeepLogSeg(recptr, &_logSegNo);
-	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
+	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT | RS_INVAL_XID_AGE,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
@@ -7801,7 +7801,7 @@ CreateRestartPoint(int flags)
 	replayPtr = GetXLogReplayRecPtr(&replayTLI);
 	endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
 	KeepLogSeg(endptr, &_logSegNo);
-	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
+	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT | RS_INVAL_XID_AGE,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..28be9225501 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -47,6 +47,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
+#include "replication/slot.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/pmsignal.h"
@@ -129,6 +130,7 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static void try_replication_slot_invalidation(void);
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -471,6 +473,31 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	MemoryContextDelete(vac_context);
 }
 
+/*
+ * Try invalidating replication slots based on current replication slot xmin
+ * limits once every vacuum cycle.
+ */
+static void
+try_replication_slot_invalidation(void)
+{
+	TransactionId min_slot_xmin = InvalidTransactionId;
+	TransactionId min_slot_catalog_xmin = InvalidTransactionId;
+
+	ProcArrayGetReplicationSlotXmin(&min_slot_xmin, &min_slot_catalog_xmin);
+
+	if (ReplicationSlotIsXIDAged(min_slot_xmin, min_slot_catalog_xmin))
+	{
+		/*
+		 * Note that InvalidateObsoleteReplicationSlots is also called as part
+		 * of CHECKPOINT, and emitting ERRORs from within is avoided already.
+		 * Therefore, there is no concern here that any ERROR from
+		 * invalidating replication slots blocks VACUUM.
+		 */
+		InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE, 0,
+										   InvalidOid, InvalidTransactionId);
+	}
+}
+
 /*
  * Internal entry point for autovacuum and the VACUUM / ANALYZE commands.
  *
@@ -498,7 +525,7 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 	   MemoryContext vac_context, bool isTopLevel)
 {
 	static bool in_vacuum = false;
-
+	static bool first_time = true;
 	const char *stmttype;
 	volatile bool in_outer_xact,
 				use_own_xacts;
@@ -611,6 +638,22 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 		CommitTransactionCommand();
 	}
 
+	if (params.options & VACOPT_VACUUM)
+	{
+		if (first_time)
+			try_replication_slot_invalidation();
+
+		/*
+		 * Every autovacuum worker will attempt to invalidate replication slots once.
+		 * If a replication slot exceeds the age specified by max_slot_xid_age then
+		 * it will only be invalidated once the next worker attempt kicks off. For
+		 * manual VACUUM always attempt invalidation to account for long-lived
+		 * maintenance connections.
+		 */
+		if (AmAutoVacuumWorkerProcess())
+			first_time = false;
+	}
+
 	/* Turn vacuum cost accounting on or off, and set/clear in_vacuum */
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fd0fdb96d42..b4fc0ba3702 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -116,6 +116,7 @@ static const SlotInvalidationCauseMap SlotInvalidationCauses[] = {
 	{RS_INVAL_HORIZON, "rows_removed"},
 	{RS_INVAL_WAL_LEVEL, "wal_level_insufficient"},
 	{RS_INVAL_IDLE_TIMEOUT, "idle_timeout"},
+	{RS_INVAL_XID_AGE, "xid_aged"},
 };
 
 /*
@@ -157,6 +158,12 @@ int			max_replication_slots = 10; /* the maximum number of replication
  */
 int			idle_replication_slot_timeout_secs = 0;
 
+/*
+ * Invalidate replication slots that have xmin or catalog_xmin greater
+ * than the specified age; '0' disables it.
+ */
+int			max_slot_xid_age = 0;
+
 /*
  * This GUC lists streaming replication standby server slot names that
  * logical WAL sender processes will wait for.
@@ -1620,7 +1627,9 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 					   XLogRecPtr restart_lsn,
 					   XLogRecPtr oldestLSN,
 					   TransactionId snapshotConflictHorizon,
-					   long slot_idle_seconds)
+					   long slot_idle_seconds,
+					   TransactionId xmin,
+					   TransactionId catalog_xmin)
 {
 	StringInfoData err_detail;
 	StringInfoData err_hint;
@@ -1665,6 +1674,26 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 								 "idle_replication_slot_timeout");
 				break;
 			}
+
+		case RS_INVAL_XID_AGE:
+			{
+				Assert(TransactionIdIsValid(xmin) || TransactionIdIsValid(catalog_xmin));
+
+				if (TransactionIdIsValid(xmin))
+					appendStringInfo(&err_detail, _("The slot's xmin %u exceeds the maximum xid age %d specified by \"max_slot_xid_age\"."),
+								 xmin,
+								 max_slot_xid_age);
+				else
+					appendStringInfo(&err_detail, _("The slot's catalog_xmin %u exceeds the maximum xid age %d specified by \"max_slot_xid_age\"."),
+								 catalog_xmin,
+								 max_slot_xid_age);
+
+				appendStringInfo(&err_hint, _("You might need to increase \"%s\"."),
+								 "max_slot_xid_age");
+
+
+				break;
+			}
 		case RS_INVAL_NONE:
 			pg_unreachable();
 	}
@@ -1783,6 +1812,13 @@ DetermineSlotInvalidationCause(uint32 possible_causes, ReplicationSlot *s,
 		}
 	}
 
+	if (possible_causes & RS_INVAL_XID_AGE)
+	{
+		/* Safe since we hold the replication slot's spinlock needed to avoid race conditions */
+		if (ReplicationSlotIsXIDAged(s->data.xmin, s->data.catalog_xmin))
+			return RS_INVAL_XID_AGE;
+	}
+
 	return RS_INVAL_NONE;
 }
 
@@ -1972,7 +2008,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 				ReportSlotInvalidation(invalidation_cause, true, active_pid,
 									   slotname, restart_lsn,
 									   oldestLSN, snapshotConflictHorizon,
-									   slot_idle_secs);
+									   slot_idle_secs, s->data.xmin, s->data.catalog_xmin);
 
 				if (MyBackendType == B_STARTUP)
 					(void) SendProcSignal(active_pid,
@@ -2019,7 +2055,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 			ReportSlotInvalidation(invalidation_cause, false, active_pid,
 								   slotname, restart_lsn,
 								   oldestLSN, snapshotConflictHorizon,
-								   slot_idle_secs);
+								   slot_idle_secs, s->data.xmin, s->data.catalog_xmin);
 
 			/* done with this slot for now */
 			break;
@@ -2044,6 +2080,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
  * - RS_INVAL_WAL_LEVEL: is logical and wal_level is insufficient
  * - RS_INVAL_IDLE_TIMEOUT: has been idle longer than the configured
  *   "idle_replication_slot_timeout" duration.
+ * - RS_INVAL_XID_AGE: slot xid age is older than the configured
+ *   "max_slot_xid_age" age
  *
  * Note: This function attempts to invalidate the slot for multiple possible
  * causes in a single pass, minimizing redundant iterations. The "cause"
@@ -3093,3 +3131,39 @@ WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
 
 	ConditionVariableCancelSleep();
 }
+
+/*
+ * Check true if the given passed in xmin or catalog_xmin age is
+ * older than the age specified by max_slot_xid_age.
+ */
+bool
+ReplicationSlotIsXIDAged(TransactionId xmin, TransactionId catalog_xmin)
+{
+	TransactionId cutoff;
+	TransactionId curr;
+	bool is_aged = false;
+
+	if (max_slot_xid_age == 0)
+		return false;
+
+	curr = ReadNextTransactionId();
+
+	/*
+	 * Calculate oldest XID a slot's xmin or catalog_xmin can have before
+	 * they are invalidated.
+	 */
+	cutoff = curr - max_slot_xid_age;
+
+	if (!TransactionIdIsNormal(cutoff))
+		cutoff = FirstNormalTransactionId;
+
+	if (TransactionIdIsNormal(xmin) &&
+		TransactionIdPrecedesOrEquals(xmin, cutoff))
+		is_aged = true;
+
+	if (TransactionIdIsNormal(catalog_xmin) &&
+		TransactionIdPrecedesOrEquals(catalog_xmin, cutoff))
+		is_aged = true;
+
+	return is_aged;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..c162268c314 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1717,6 +1717,14 @@
   max => 'INT_MAX',
 },
 
+{ name => 'max_slot_xid_age', type => 'int', context => 'PGC_SIGHUP', group => 'REPLICATION_SENDING',
+  short_desc => 'Age of the transaction ID at which a replication slot gets invalidated.',
+  variable => 'max_slot_xid_age',
+  boot_val => '0',
+  min => '0',
+  max => '2000000000',
+},
+
 # we have no microseconds designation, so can't supply units here
 { name => 'commit_delay', type => 'int', context => 'PGC_SUSET', group => 'WAL_SETTINGS',
   short_desc => 'Sets the delay in microseconds between transaction commit and flushing WAL to disk.',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index fe62162cde3..3253f108ffc 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -66,10 +66,12 @@ typedef enum ReplicationSlotInvalidationCause
 	RS_INVAL_WAL_LEVEL = (1 << 2),
 	/* idle slot timeout has occurred */
 	RS_INVAL_IDLE_TIMEOUT = (1 << 3),
+	/* slot's xmin or catalog_xmin has reached max xid age */
+	RS_INVAL_XID_AGE = (1 << 4),
 } ReplicationSlotInvalidationCause;
 
 /* Maximum number of invalidation causes */
-#define	RS_INVAL_MAX_CAUSES 4
+#define	RS_INVAL_MAX_CAUSES 5
 
 /*
  * On-Disk data of a replication slot, preserved across restarts.
@@ -293,6 +295,7 @@ extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot;
 extern PGDLLIMPORT int max_replication_slots;
 extern PGDLLIMPORT char *synchronized_standby_slots;
 extern PGDLLIMPORT int idle_replication_slot_timeout_secs;
+extern PGDLLIMPORT int max_slot_xid_age;
 
 /* shmem initialization functions */
 extern Size ReplicationSlotsShmemSize(void);
@@ -350,4 +353,6 @@ extern bool SlotExistsInSyncStandbySlots(const char *slot_name);
 extern bool StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel);
 extern void WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn);
 
+extern bool ReplicationSlotIsXIDAged(TransactionId xmin, TransactionId catalog_xmin);
+
 #endif							/* SLOT_H */
diff --git a/src/test/recovery/t/049_invalidate_xid_aged_slots.pl b/src/test/recovery/t/049_invalidate_xid_aged_slots.pl
new file mode 100644
index 00000000000..f1e8f003f95
--- /dev/null
+++ b/src/test/recovery/t/049_invalidate_xid_aged_slots.pl
@@ -0,0 +1,240 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test for replication slots invalidation due to XID age
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::BackgroundPsql;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+# Wait for slot to first become inactive and then get invalidated
+sub wait_for_slot_invalidation
+{
+	my ($node, $slot_name, $reason) = @_;
+	my $name = $node->name;
+
+	# Wait for the inactive replication slot to be invalidated
+	$node->poll_query_until(
+		'postgres', qq[
+		SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+			WHERE slot_name = '$slot_name' AND
+			invalidation_reason = '$reason';
+	])
+	  or die
+	  "Timed out while waiting for inactive slot $slot_name to be invalidated on node $name";
+}
+
+# Do some work for advancing xids on a given node
+sub advance_xids
+{
+	my ($node, $table_name) = @_;
+
+	$node->safe_psql(
+		'postgres', qq[
+		do \$\$
+		begin
+		for i in 10000..11000 loop
+			-- use an exception block so that each iteration eats an XID
+			begin
+			insert into $table_name values (i);
+			exception
+			when division_by_zero then null;
+			end;
+		end loop;
+		end\$\$;
+	]);
+}
+
+# =============================================================================
+# Testcase start: Invalidate streaming standby's slot due to max_slot_xid_age
+# GUC.
+
+# Initialize primary node
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 'logical');
+
+# Configure primary with XID age settings
+$primary->append_conf(
+	'postgresql.conf', qq{
+max_slot_xid_age = 500
+});
+
+$primary->start;
+
+# Take a backup for creating standby
+my $backup_name = 'backup';
+$primary->backup($backup_name);
+
+# Create a standby linking to the primary using the replication slot
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+# Enable hs_feedback. The slot should gain an xmin. We set the status interval
+# so we'll see the results promptly.
+$standby->append_conf(
+	'postgresql.conf', q{
+primary_slot_name = 'sb_slot'
+hot_standby_feedback = on
+wal_receiver_status_interval = 1
+max_standby_streaming_delay = 3600000
+});
+
+$primary->safe_psql(
+	'postgres', qq[
+    SELECT pg_create_physical_replication_slot(slot_name := 'sb_slot', immediately_reserve := true);
+]);
+
+$standby->start;
+
+# Create some content on primary to move xmin
+$primary->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1,10) AS a");
+
+# Wait until standby has replayed enough data
+$primary->wait_for_catchup($standby);
+
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT (xmin IS NOT NULL) OR (catalog_xmin IS NOT NULL)
+		FROM pg_catalog.pg_replication_slots
+		WHERE slot_name = 'sb_slot';
+]) or die "Timed out waiting for slot sb_slot xmin to advance";
+
+# Stop standby to make the replication slot's xmin on primary to age
+
+# Read on standby that causes xmin to be held on slot
+my $standby_session = $standby->interactive_psql('postgres');
+$standby_session->query("BEGIN; SET default_transaction_isolation = 'repeatable read'; SELECT * FROM tab_int;");
+
+#$standby->stop;
+
+# Do some work to advance xids on primary
+advance_xids($primary, 'tab_int');
+
+# Wait for the replication slot to become inactive and then invalidated due to
+# XID age.
+$primary->safe_psql('postgres', "CHECKPOINT");
+wait_for_slot_invalidation($primary, 'sb_slot', 'xid_aged');
+
+$standby_session->quit;
+$standby->stop;
+
+# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
+# GUC.
+# =============================================================================
+
+# =============================================================================
+# Testcase start: Invalidate logical subscriber's slot due to max_slot_xid_age
+# GUC.
+
+# Create a subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+# Create tables on both primary and subscriber
+$primary->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some initial data
+$primary->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $primary->connstr . ' dbname=postgres';
+$primary->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR TABLE test_tbl");
+
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub WITH (slot_name = 'lsub_slot')"
+);
+
+# Wait for initial sync to complete
+$subscriber->wait_for_subscription_sync($primary, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+is($result, qq(5), "check initial copy was done for logical replication");
+
+# Wait for the logical slot to get catalog_xmin (logical slots use catalog_xmin, not xmin)
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT xmin IS NULL AND catalog_xmin IS NOT NULL
+	FROM pg_catalog.pg_replication_slots
+	WHERE slot_name = 'lsub_slot';
+]) or die "Timed out waiting for slot lsub_slot catalog_xmin to advance";
+
+# Stop subscriber to make the replication slot on primary inactive
+$subscriber->stop;
+
+# Do some work to advance xids on primary
+advance_xids($primary, 'test_tbl');
+
+# Wait for the replication slot to become inactive and then invalidated due to
+# XID age.
+$primary->safe_psql('postgres', "CHECKPOINT");
+wait_for_slot_invalidation($primary, 'lsub_slot', 'xid_aged');
+
+# Testcase end: Invalidate logical subscriber's slot due to max_slot_xid_age
+# GUC.
+# =============================================================================
+
+# =============================================================================
+# Testcase start: Test VACUUM command triggering slot invalidation
+#
+
+# Create another physical replication slot for VACUUM test
+$primary->safe_psql(
+	'postgres', qq[
+    SELECT pg_create_physical_replication_slot(slot_name := 'vacuum_test_slot', immediately_reserve := true);
+]);
+
+# Create a new standby for this test
+my $standby_vacuum = PostgreSQL::Test::Cluster->new('standby_vacuum');
+$standby_vacuum->init_from_backup($primary, $backup_name, has_streaming => 1);
+
+$standby_vacuum->append_conf(
+	'postgresql.conf', q{
+primary_slot_name = 'vacuum_test_slot'
+hot_standby_feedback = on
+wal_receiver_status_interval = 1
+});
+
+$standby_vacuum->start;
+
+# Wait until standby has replayed enough data and slot gets xmin
+$primary->wait_for_catchup($standby_vacuum);
+
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT (xmin IS NOT NULL) OR (catalog_xmin IS NOT NULL)
+		FROM pg_catalog.pg_replication_slots
+		WHERE slot_name = 'vacuum_test_slot';
+]) or die "Timed out waiting for slot vacuum_test_slot xmin to advance";
+
+# Stop standby to make the replication slot's xmin on primary to age
+$standby_vacuum->stop;
+
+# Do some work to advance xids on primary
+advance_xids($primary, 'tab_int');
+
+# Use VACUUM to trigger slot invalidation (instead of CHECKPOINT)
+# This tests that VACUUM command can trigger XID age invalidation
+$primary->safe_psql('postgres', "VACUUM");
+
+# Wait for the replication slot to become invalidated due to XID age triggered by VACUUM
+$primary->poll_query_until(
+	'postgres', qq[
+	SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+		WHERE slot_name = 'vacuum_test_slot' AND
+		invalidation_reason = 'xid_aged';
+])
+  or die "Timed out while waiting for slot vacuum_test_slot to be invalidated by VACUUM";
+
+# Testcase end: Test VACUUM command triggering slot invalidation
+# =============================================================================
+
+ok(1, "all XID age invalidation tests completed successfully");
+
+done_testing();
-- 
2.50.1

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

4 months ago

In reply to: John H (#1)

Re: Introduce XID age based replication slot invalidation

Hi,

On Thu, Sep 18, 2025 at 10:20 AM John H <johnhyvr@gmail.com> wrote:

I'd like to restart the discussion about providing an xid-based slot
invalidation mechanism. The previous effort [1] presented an XID and
time-based invalidation and the inactive time-based approach was
implemented first. The latest XID based patch from Bharath Rupireddy
can be found here [2].

When thinking about availability of the database, inactive replication
slots cause two main pain points:
1) WAL accumulation
2) Replication slots with xmin/catalog_xmin can hold back vacuuming
leading to wrap-around

It's easy to imagine a high-XID churning workload in one cluster while
another has large batch jobs where changes get synced out
periodically. There isn't a "one-size" fits all setting for
'idle_replication_slot_timeout' in these two cases.

+1.

The attached patch addresses this by introducing 'max_slot_xid_age' in
a similar fashion. Replication slots with transaction ID greater than
the set age will get invalidated allowing vacuum to proceed, biasing
towards database availability.

Invalidation happens in CHECKPOINT, similar to
'idle_replication_slot_timeout', and when VACUUM occurs.

The patch currently attempts to invalidate once-per-autovacuum worker.
We're wondering if it should attempt invalidation on a per-relation
basis within the vacuum call itself. That would account for scenarios
where the cost_delay or naptime is high between autovac executions.

IMO, computing XID horizons per-relation during vacuum is good. The
main reason we try to invalidate replication slots based on the XID
age in the vacuum path is to help the database when it needs it most -
when vacuum is computing the XID horizons. That said, it would be good
to have performance analysis with a large number of replication slots,
comparing once-per-relation vs. once-per-autovacuum worker vs.
once-per-autovacuum launcher wake-up cycle.

I haven't looked at the patch in depth, but it would be good to have a
TAP test with more realistic production workloads. We could set this
value to less than 1.5 billion and use xid_wraparound test to quickly
reach the wraparound limits, then verify if this setting can help
prevent the database from reaching wraparound errors. This approach
would also validate the age calculations in
try_replication_slot_invalidation with higher limits.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com