Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

Started by Tomas Vondraalmost 4 years ago7 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com
1 attachment(s)

Hi,

There's a bug in ProcArrayApplyRecoveryInfo, introduced by 8431e296ea,
which may cause failures when starting a replica, making it unusable.
The commit message for 8431e296ea is not very clear about what exactly
is being done and why, but the root cause is that at while processing
RUNNING_XACTS, the XIDs are sorted like this:

/*
* Sort the array so that we can add them safely into
* KnownAssignedXids.
*/
qsort(xids, nxids, sizeof(TransactionId), xidComparator);

where "safely" likely means "not violating the ordering expected by
KnownAssignedXidsAdd". Unfortunately, xidComparator compares the values
as plain uint32 values, while KnownAssignedXidsAdd actually calls
TransactionIdFollowsOrEquals() and compares the logical XIDs :-(

Triggering this is pretty simple - all you need is two transactions with
XIDs before/after the 4B limit, and then (re)start a replica. The
replica refuses to start with a message like this:

LOG: 9 KnownAssignedXids (num=4 tail=0 head=4) [0]=32705 [1]=32706
[2]: =32707 [3]=32708 CONTEXT: WAL redo at 0/6000120 for Standby/RUNNING_XACTS: nextXid 32715 latestCompletedXid 32714 oldestRunningXid 4294967001; 8 xacts: 32708 32707 32706 32705 4294967009 4294967008 4294967007 4294967006 FATAL: out-of-order XID insertion in KnownAssignedXids
CONTEXT: WAL redo at 0/6000120 for Standby/RUNNING_XACTS: nextXid
32715 latestCompletedXid 32714 oldestRunningXid
4294967001; 8 xacts: 32708 32707 32706 32705 4294967009
4294967008 4294967007 4294967006
FATAL: out-of-order XID insertion in KnownAssignedXids

Clearly, we add the 4 "younger" XIDs first (because that's what the XID
comparator does), but then KnownAssignedXidsAdd thinks there's some sort
of corruption because logically 4294967006 is older.

This does not affect replicas in STANDBY_SNAPSHOT_READY state, because
in that case ProcArrayApplyRecoveryInfo ignores RUNNING_XACTS messages.

The probability of hitting this in practice is proportional to how long
you leave transactions running. The system where we observed this leaves
transactions with XIDs open for days, and the age may be ~40M.
Intuitivelly, that's ~40M/4B (=1%) probability that at any given time
there are transactions with contradicting ordering. So most restarts
worked fine, until one that happened at just the "right" time.

This likely explains why we never got any reports about this - most
systems probably don't leave transactions running for this long, so the
probability is much lower. And replica restarts are generally not that
common events either.

Attached patch is fixing this by just sorting the XIDs logically. The
xidComparator is meant for places that can't do logical ordering. But
these XIDs come from RUNNING_XACTS, so they actually come from the same
wraparound epoch (so sorting logically seems perfectly fine).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Fix-ordering-of-XIDs-in-ProcArrayApplyRecoveryInfo.patchtext/x-patch; charset=UTF-8; name=0001-Fix-ordering-of-XIDs-in-ProcArrayApplyRecoveryInfo.patchDownload
From c96f5d64f977bf252451657cfe80d603869a2c1e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 23 Jan 2022 01:00:29 +0100
Subject: [PATCH] Fix ordering of XIDs in ProcArrayApplyRecoveryInfo

Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids, but it used xidComparator for
that purpose, which simply compares the values as uint32 values. This
may easily confuse KnownAssignedXidsAdd() which enforces the logical
ordering by calling TransactionIdFollowsOrEquals() etc.

Hitting this issue is fairly easy - you just need two transactons, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This however only happens during replica initialization - so you have to
restart (or create) a replica while there are such transactions with
contradictory XID ordering. Long-running transactions naturally increase
the likelihood of hitting this issue. Once the replica gets into this
state, it can't be started (even if the old transactions are terminated
in some way).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Patch by me. Investigation and root cause analysis by Abhijit Menon-Sen.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
---
 src/backend/storage/ipc/procarray.c |  7 ++++++-
 src/backend/utils/adt/xid.c         | 26 ++++++++++++++++++++++++++
 src/include/utils/builtins.h        |  1 +
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 3be60402890..9d3efb7d80e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1164,8 +1164,13 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		/*
 		 * Sort the array so that we can add them safely into
 		 * KnownAssignedXids.
+		 *
+		 * We have to sort them logically, because in KnownAssignedXidsAdd we
+		 * call TransactionIdFollowsOrEquals and so on. But we know these XIDs
+		 * come from RUNNING_XACTS, which means there are only normal XIDs from
+		 * the same epoch, so this is safe.
 		 */
-		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
+		qsort(xids, nxids, sizeof(TransactionId), xidLogicalComparator);
 
 		/*
 		 * Add the sorted snapshot into KnownAssignedXids.  The running-xacts
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index e6633e9cffe..9b4ceaea47f 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -145,6 +145,32 @@ xidComparator(const void *arg1, const void *arg2)
 	return 0;
 }
 
+/*
+ * xidLogicalComparator
+ *		qsort comparison function for XIDs
+ *
+ * This is used to compare only XIDs from the same epoch (e.g. for backends
+ * running at the same time). So there must be only normal XIDs, so there's
+ * no issue with triangle inequality.
+ */
+int
+xidLogicalComparator(const void *arg1, const void *arg2)
+{
+	TransactionId xid1 = *(const TransactionId *) arg1;
+	TransactionId xid2 = *(const TransactionId *) arg2;
+
+	Assert(TransactionIdIsNormal(xid1));
+	Assert(TransactionIdIsNormal(xid2));
+
+	if (TransactionIdPrecedes(xid1, xid2))
+		return -1;
+
+	if (TransactionIdPrecedes(xid2, xid1))
+		return 1;
+
+	return 0;
+}
+
 Datum
 xid8toxid(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ac4780e3fc..d8f05a7df3a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -87,6 +87,7 @@ extern void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len);
 
 /* xid.c */
 extern int	xidComparator(const void *arg1, const void *arg2);
+extern int	xidLogicalComparator(const void *arg1, const void *arg2);
 
 /* inet_cidr_ntop.c */
 extern char *pg_inet_cidr_ntop(int af, const void *src, int bits,
-- 
2.31.1

#2Bossart, Nathan
bossartn@amazon.com
In reply to: Tomas Vondra (#1)
Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

On 1/22/22, 4:43 PM, "Tomas Vondra" <tomas.vondra@enterprisedb.com> wrote:

There's a bug in ProcArrayApplyRecoveryInfo, introduced by 8431e296ea,
which may cause failures when starting a replica, making it unusable.
The commit message for 8431e296ea is not very clear about what exactly
is being done and why, but the root cause is that at while processing
RUNNING_XACTS, the XIDs are sorted like this:

/*
* Sort the array so that we can add them safely into
* KnownAssignedXids.
*/
qsort(xids, nxids, sizeof(TransactionId), xidComparator);

where "safely" likely means "not violating the ordering expected by
KnownAssignedXidsAdd". Unfortunately, xidComparator compares the values
as plain uint32 values, while KnownAssignedXidsAdd actually calls
TransactionIdFollowsOrEquals() and compares the logical XIDs :-(

Wow, nice find.

This likely explains why we never got any reports about this - most
systems probably don't leave transactions running for this long, so the
probability is much lower. And replica restarts are generally not that
common events either.

I'm aware of one report with the same message [0]/messages/by-id/1476795473014.15979.2188@webmail4, but I haven't read
closely enough to determine whether it is the same issue. It looks
like that particular report was attributed to backup_label being
removed.

Attached patch is fixing this by just sorting the XIDs logically. The
xidComparator is meant for places that can't do logical ordering. But
these XIDs come from RUNNING_XACTS, so they actually come from the same
wraparound epoch (so sorting logically seems perfectly fine).

The patch looks reasonable to me.

Nathan

[0]: /messages/by-id/1476795473014.15979.2188@webmail4

#3Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Bossart, Nathan (#2)
Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

On 1/24/22 22:28, Bossart, Nathan wrote:

On 1/22/22, 4:43 PM, "Tomas Vondra" <tomas.vondra@enterprisedb.com> wrote:

There's a bug in ProcArrayApplyRecoveryInfo, introduced by 8431e296ea,
which may cause failures when starting a replica, making it unusable.
The commit message for 8431e296ea is not very clear about what exactly
is being done and why, but the root cause is that at while processing
RUNNING_XACTS, the XIDs are sorted like this:

/*
* Sort the array so that we can add them safely into
* KnownAssignedXids.
*/
qsort(xids, nxids, sizeof(TransactionId), xidComparator);

where "safely" likely means "not violating the ordering expected by
KnownAssignedXidsAdd". Unfortunately, xidComparator compares the values
as plain uint32 values, while KnownAssignedXidsAdd actually calls
TransactionIdFollowsOrEquals() and compares the logical XIDs :-(

Wow, nice find.

This likely explains why we never got any reports about this - most
systems probably don't leave transactions running for this long, so the
probability is much lower. And replica restarts are generally not that
common events either.

I'm aware of one report with the same message [0], but I haven't read
closely enough to determine whether it is the same issue. It looks
like that particular report was attributed to backup_label being
removed.

Yeah, I saw that thread too, and I don't think it's the same issue. As
you say, it seems to be caused by the backup_label shenanigans, and
there's also the RUNNING_XACTS message:

Sep 20 15:00:27 ... CONTEXT: xlog redo Standby/RUNNING_XACTS: nextXid
38585 latestCompletedXid 38571 oldestRunningXid 38572; 14 xacts: 38573
38575 38579 38578 38574 38581 38580 38576 38577 38572 38582 38584 38583
38583

The XIDs don't cross the 4B boundary at all, so this seems unrelated.

Attached patch is fixing this by just sorting the XIDs logically. The
xidComparator is meant for places that can't do logical ordering. But
these XIDs come from RUNNING_XACTS, so they actually come from the same
wraparound epoch (so sorting logically seems perfectly fine).

The patch looks reasonable to me.

Thanks!

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#3)
Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

On Mon, Jan 24, 2022 at 10:45:48PM +0100, Tomas Vondra wrote:

On 1/24/22 22:28, Bossart, Nathan wrote:

Attached patch is fixing this by just sorting the XIDs logically. The
xidComparator is meant for places that can't do logical ordering. But
these XIDs come from RUNNING_XACTS, so they actually come from the same
wraparound epoch (so sorting logically seems perfectly fine).

The patch looks reasonable to me.

Thanks!

Could it be possible to add a TAP test? One idea would be to rely on
pg_resetwal -x and -e close to the 4B limit to set up a node before
stressing the scenario of this bug, so that would be rather cheap.
--
Michael

#5Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Michael Paquier (#4)
2 attachment(s)
Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

On 1/25/22 04:25, Michael Paquier wrote:

On Mon, Jan 24, 2022 at 10:45:48PM +0100, Tomas Vondra wrote:

On 1/24/22 22:28, Bossart, Nathan wrote:

Attached patch is fixing this by just sorting the XIDs logically. The
xidComparator is meant for places that can't do logical ordering. But
these XIDs come from RUNNING_XACTS, so they actually come from the same
wraparound epoch (so sorting logically seems perfectly fine).

The patch looks reasonable to me.

Thanks!

Could it be possible to add a TAP test? One idea would be to rely on
pg_resetwal -x and -e close to the 4B limit to set up a node before
stressing the scenario of this bug, so that would be rather cheap.

I actually tried doing that, but I was not very happy with the result.
The test has to call pg_resetwal, but then it also has to fake pg_xact
data and so on, which seemed a bit ugly so did not include the test in
the patch.

But maybe there's a better way to do this, so here it is. I've kept it
separately, so that it's possible to apply it without the fix, to verify
it actually triggers the issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-TAP-test.patchtext/x-patch; charset=UTF-8; name=0001-Add-TAP-test.patchDownload
From efb0e79d752102dd0fa066959cea517288cc6085 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 26 Jan 2022 19:24:21 +0100
Subject: [PATCH 1/2] Add TAP test

---
 src/test/recovery/t/028_wraparound_restart.pl | 172 ++++++++++++++++++
 1 file changed, 172 insertions(+)
 create mode 100644 src/test/recovery/t/028_wraparound_restart.pl

diff --git a/src/test/recovery/t/028_wraparound_restart.pl b/src/test/recovery/t/028_wraparound_restart.pl
new file mode 100644
index 00000000000..2be5537b6b3
--- /dev/null
+++ b/src/test/recovery/t/028_wraparound_restart.pl
@@ -0,0 +1,172 @@
+# Run the standard regression tests with streaming replication
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 36;
+use File::Basename;
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->adjust_conf('postgresql.conf', 'max_connections', '25', 1);
+$node_primary->append_conf('postgresql.conf', 'max_prepared_transactions = 10');
+
+# needed so that vacuumdb freezes the template too, and we don't get warnings
+# about possible data loss due to wraparound
+$node_primary->start;
+$node_primary->psql(
+        'postgres',
+        qq[UPDATE pg_database SET datallowconn = 't']);
+$node_primary->stop;
+
+# reset WAL close to max XID (4294967295), in 4 smaller steps (~1B each)
+command_checks_all(
+	[ 'pg_resetwal', '-x', 1000000000, $node_primary->data_dir ],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 1B XID');
+
+command_checks_all(
+	[ 'dd', 'if=/dev/zero', 'of=' . $node_primary->data_dir . '/pg_xact/03B9', 'bs=1K', 'count=256'],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 1B XID');
+
+$node_primary->start;
+$node_primary->issues_sql_like(
+	[ 'vacuumdb', '--all', '--freeze', '--echo' ],
+	qr/statement: VACUUM.*statement: VACUUM/s,
+	'vacuum all databases');
+$node_primary->stop;
+
+command_checks_all(
+	[ 'pg_resetwal', '-x', 2000000000, $node_primary->data_dir ],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 2B XID');
+
+command_checks_all(
+	[ 'dd', 'if=/dev/zero', 'of=' . $node_primary->data_dir . '/pg_xact/0773', 'bs=1K', 'count=256'],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 2B XID');
+
+$node_primary->start;
+$node_primary->issues_sql_like(
+	[ 'vacuumdb', '--all', '--freeze', '--echo' ],
+	qr/statement: VACUUM.*statement: VACUUM/s,
+	'vacuum all databases');
+$node_primary->stop;
+
+command_checks_all(
+	[ 'pg_resetwal', '-x', 3000000000, $node_primary->data_dir ],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 3B XID');
+
+command_checks_all(
+	[ 'dd', 'if=/dev/zero', 'of=' . $node_primary->data_dir . '/pg_xact/0B2D', 'bs=1K', 'count=256'],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 3B XID');
+
+$node_primary->start;
+
+is( $node_primary->psql(
+        'postgres',
+        qq[SELECT datname
+    , age(datfrozenxid)
+    , current_setting('autovacuum_freeze_max_age') 
+FROM pg_database 
+ORDER BY 2 DESC]),
+    0,
+    'physical slot created on primary');
+
+$node_primary->issues_sql_like(
+	[ 'vacuumdb', '--all', '--freeze', '--echo' ],
+	qr/statement: VACUUM.*statement: VACUUM/s,
+	'vacuum all databases');
+$node_primary->stop;
+
+# the max is 4294967295, so this is within 100 transactions of wraparound
+command_checks_all(
+	[ 'pg_resetwal', '-x', 4294967200, $node_primary->data_dir ],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 4B XID');
+
+command_checks_all(
+	[ 'dd', 'if=/dev/zero', 'of=' . $node_primary->data_dir . '/pg_xact/0FFF', 'bs=1K', 'count=256'],
+	0,
+	[qr''],
+	[qr''],
+	'reset WAL to 4B XID');
+
+$node_primary->start;
+$node_primary->issues_sql_like(
+	[ 'vacuumdb', '--all', '--freeze', '--echo' ],
+	qr/statement: VACUUM.*statement: VACUUM/s,
+	'vacuum all databases');
+
+
+# now, start transactions with XID before/after the 4B limit
+my $main_in    = '';
+my $main_out   = '';
+my $main_timer = IPC::Run::timeout(180);
+
+my $main_h =
+  $node_primary->background_psql('postgres', \$main_in, \$main_out,
+	$main_timer, on_error_stop => 1);
+$main_in .= q(
+BEGIN;
+SELECT txid_current();
+\echo syncpoint1
+);
+pump $main_h until $main_out =~ /syncpoint1/ || $main_timer->is_expired;
+
+# run 200 trivial transactions to consume enough XIDs and cross 4B
+$node_primary->pgbench(
+	"-t 200 -n",
+	0,
+	[qr{}],
+	[qr{^$}],
+	'pgbench custom scripts',
+	{
+		'001_pgbench_custom_script_1@1' => q{-- select only
+SELECT txid_current();
+}
+	});
+
+$main_timer = IPC::Run::timeout(180);
+$main_h =
+  $node_primary->background_psql('postgres', \$main_in, \$main_out,
+	$main_timer, on_error_stop => 1);
+$main_in .= q(
+BEGIN;
+SELECT txid_current();
+\echo syncpoint1
+);
+pump $main_h until $main_out =~ /syncpoint1/ || $main_timer->is_expired;
+
+# Take backup and try starting a replica from the backup
+my $backup_name = 'my_backup';
+
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = PostgreSQL::Test::Cluster->new('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+    
+$node_standby_1->start;
+
+$node_standby_1->stop;
+$node_primary->stop;
-- 
2.31.1

0002-Fix-ordering-of-XIDs-in-ProcArrayApplyRecoveryInfo.patchtext/x-patch; charset=UTF-8; name=0002-Fix-ordering-of-XIDs-in-ProcArrayApplyRecoveryInfo.patchDownload
From 5570c1b4840999d6b6dd8bae699de6742b18fe73 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 23 Jan 2022 01:00:29 +0100
Subject: [PATCH 2/2] Fix ordering of XIDs in ProcArrayApplyRecoveryInfo

Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids, but it used xidComparator for
that purpose, which simply compares the values as uint32 values. This
may easily confuse KnownAssignedXidsAdd() which enforces the logical
ordering by calling TransactionIdFollowsOrEquals() etc.

Hitting this issue is fairly easy - you just need two transactons, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This however only happens during replica initialization - so you have to
restart (or create) a replica while there are such transactions with
contradictory XID ordering. Long-running transactions naturally increase
the likelihood of hitting this issue. Once the replica gets into this
state, it can't be started (even if the old transactions are terminated
in some way).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Patch by me. Investigation and root cause analysis by Abhijit Menon-Sen.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
---
 src/backend/storage/ipc/procarray.c |  7 ++++++-
 src/backend/utils/adt/xid.c         | 26 ++++++++++++++++++++++++++
 src/include/utils/builtins.h        |  1 +
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 3be60402890..9d3efb7d80e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1164,8 +1164,13 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		/*
 		 * Sort the array so that we can add them safely into
 		 * KnownAssignedXids.
+		 *
+		 * We have to sort them logically, because in KnownAssignedXidsAdd we
+		 * call TransactionIdFollowsOrEquals and so on. But we know these XIDs
+		 * come from RUNNING_XACTS, which means there are only normal XIDs from
+		 * the same epoch, so this is safe.
 		 */
-		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
+		qsort(xids, nxids, sizeof(TransactionId), xidLogicalComparator);
 
 		/*
 		 * Add the sorted snapshot into KnownAssignedXids.  The running-xacts
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index e6633e9cffe..9b4ceaea47f 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -145,6 +145,32 @@ xidComparator(const void *arg1, const void *arg2)
 	return 0;
 }
 
+/*
+ * xidLogicalComparator
+ *		qsort comparison function for XIDs
+ *
+ * This is used to compare only XIDs from the same epoch (e.g. for backends
+ * running at the same time). So there must be only normal XIDs, so there's
+ * no issue with triangle inequality.
+ */
+int
+xidLogicalComparator(const void *arg1, const void *arg2)
+{
+	TransactionId xid1 = *(const TransactionId *) arg1;
+	TransactionId xid2 = *(const TransactionId *) arg2;
+
+	Assert(TransactionIdIsNormal(xid1));
+	Assert(TransactionIdIsNormal(xid2));
+
+	if (TransactionIdPrecedes(xid1, xid2))
+		return -1;
+
+	if (TransactionIdPrecedes(xid2, xid1))
+		return 1;
+
+	return 0;
+}
+
 Datum
 xid8toxid(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ac4780e3fc..d8f05a7df3a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -87,6 +87,7 @@ extern void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len);
 
 /* xid.c */
 extern int	xidComparator(const void *arg1, const void *arg2);
+extern int	xidLogicalComparator(const void *arg1, const void *arg2);
 
 /* inet_cidr_ntop.c */
 extern char *pg_inet_cidr_ntop(int af, const void *src, int bits,
-- 
2.31.1

#6Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#5)
Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

On Wed, Jan 26, 2022 at 07:31:00PM +0100, Tomas Vondra wrote:

I actually tried doing that, but I was not very happy with the result. The
test has to call pg_resetwal, but then it also has to fake pg_xact data and
so on, which seemed a bit ugly so did not include the test in the patch.

Indeed, the dependency to /dev/zero is not good either. The patch
logic looks good to me.
--
Michael

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Michael Paquier (#6)
Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

On 1/26/22 23:54, Michael Paquier wrote:

On Wed, Jan 26, 2022 at 07:31:00PM +0100, Tomas Vondra wrote:

I actually tried doing that, but I was not very happy with the result. The
test has to call pg_resetwal, but then it also has to fake pg_xact data and
so on, which seemed a bit ugly so did not include the test in the patch.

Indeed, the dependency to /dev/zero is not good either. The patch
logic looks good to me.

OK, I've pushed the patch. We may consider adding a TAP test later, if
we find a reasonably clean approach.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company