BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Started by Michail Nikolaevabout 3 years ago38 messages
#1Michail Nikolaev
michail.nikolaev@gmail.com

Hello, hackers.

It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
(or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
I am not certain, but hot_standby_feedback probably able to cause the
same issues.

Steps to reproduce:

1) Start Postgres like this:

docker run -it -p 5432:5432 --name pg -e
POSTGRES_PASSWORD=postgres -e LANG=C.UTF-8 -d postgres:14.6 -c
vacuum_defer_cleanup_age=1000000

2) Prepare scheme:

CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY,
value numeric(15,4) DEFAULT 0 NOT NULL);
INSERT INTO something_is_wrong_here (value) (SELECT 10000 from
generate_series(0, 100));

3) Prepare file for pgbench:

BEGIN;

SELECT omg.*
FROM something_is_wrong_here AS omg
ORDER BY random()
LIMIT 1
FOR UPDATE
\gset

UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id;

END;

4) Run pgbench:

pgbench -c 50 -j 2 -n -f reproduce.bench 'port= 5432
host=localhost user=postgres dbname=postgres password=postgres' -T 100
-P 1

I was able to see such a set of errors (looks scary):

ERROR: MultiXactId 30818104 has not been created yet -- apparent wraparound
ERROR: could not open file "base/13757/16385.1" (target block
39591744): previous segment is only 24 blocks
ERROR: attempted to lock invisible tuple
ERROR: could not access status of transaction 38195704
DETAIL: Could not open file "pg_subtrans/0246": No such file or directory.

Best regards,
Michail.

#2Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Michail Nikolaev (#1)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hello, Andres.

I apologize for the direct ping, but I think your snapshot scalability
work in PG14 could be related to the issue.

The TransactionIdRetreatedBy implementation looks correct... But with
txid_current=212195 I see errors like "could not access status of
transaction 58643736"...
So, maybe vacuum_defer_cleanup_age just highlights some special case
(something with "previous" xids on the left side of zero?)....

Thanks,
Michail.

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Michail Nikolaev (#1)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Thu, 5 Jan 2023 at 14:12, Michail Nikolaev <michail.nikolaev@gmail.com>
wrote:

Hello, hackers.

It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
(or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
I am not certain, but hot_standby_feedback probably able to cause the
same issues.

Steps to reproduce:

[steps]

I was able to see such a set of errors (looks scary):

ERROR: MultiXactId 30818104 has not been created yet -- apparent

wraparound

ERROR: could not open file "base/13757/16385.1" (target block
39591744): previous segment is only 24 blocks

This looks quite suspicious too - it wants to access a block at 296GB of
data, where only 196kB exist.

ERROR: attempted to lock invisible tuple
ERROR: could not access status of transaction 38195704
DETAIL: Could not open file "pg_subtrans/0246": No such file or

directory.

I just saw two instances of this "attempted to lock invisible tuple" error
for the 15.1 image (run on Docker in Ubuntu in WSL) with your reproducer
script, so this does not seem to be specific to PG14 (.6).

And, after some vacuum and restarting the process, I got the following:

client 29 script 0 aborted in command 2 query 0: ERROR: heap tid from
index tuple (111,1) points past end of heap page line pointer array at
offset 262 of block 1 in index "something_is_wrong_here_pkey"

There is indeed something wrong there; the page can't be read by
pageinspect:

$ select get_raw_page('public.something_is_wrong_here', 111)::bytea;
ERROR: invalid page in block 111 of relation base/5/16385

I don't have access to the v14 data anymore (I tried a restart, which
dropped the data :-( ), but will retain my v15 instance for some time to
help any debugging.

Kind regards,

Matthias van de Meent

In reply to: Matthias van de Meent (#3)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Thu, Jan 5, 2023 at 1:49 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

client 29 script 0 aborted in command 2 query 0: ERROR: heap tid from index tuple (111,1) points past end of heap page line pointer array at offset 262 of block 1 in index "something_is_wrong_here_pkey"

This particular error message is from the hardening added to Postgres
15 in commit e7428a99. So it's not surprising that Michail didn't see
the same error on 14.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#4)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Thu, Jan 5, 2023 at 3:27 PM Peter Geoghegan <pg@bowt.ie> wrote:

This particular error message is from the hardening added to Postgres
15 in commit e7428a99. So it's not surprising that Michail didn't see
the same error on 14.

Reproduced this on HEAD locally (no docker), without any difficulty.

FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of
heap_lock_tuple() is the first thing that fails on my assert-enabled
build.

--
Peter Geoghegan

#6Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Peter Geoghegan (#5)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hello!

Thanks for checking the issue!

FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of
heap_lock_tuple() is the first thing that fails on my assert-enabled
build.

Yes, the same for me:

TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c",
Line: 4292, PID: 33416

Reproduced this on HEAD locally (no docker), without any difficulty.

It is a little bit harder without docker in my case, need to adjust
connections and number of threads:

pgbench -c 90 -j 8 -n -f reproduce.bench 'port= 5432
host=localhost user=postgres dbname=postgres password=postgres' -T
2000 -P 1

Best regards,
Michail.

#7Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Michail Nikolaev (#6)
3 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hello.

The few things I have got so far:

1) It is not required to order by random() to reproduce the issue - it
could be done using queries like:

BEGIN;
SELECT omg.*
FROM something_is_wrong_here AS omg
ORDER BY value -- change is here
LIMIT 1
FOR UPDATE
\gset

UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id;
COMMIT;

But for some reason it is harder to reproduce without random in my
case (typically need to wait for about a minute with 100 connections).

2) It is not an issue at table creation time. Issue is reproducible if
vacuum_defer_cleanup_age set after table preparation.

3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid
over zero (be >= txid_current()).
And it is stable.... So, for example - unable to reproduce with 733
value, but 734 gives error each time.
Just a single additional txid_current() (after data is filled) fixes a
crash... It looks like the first SELECT FOR UPDATE + UPDATE silently
poisons everything somehow.
You could use such PSQL script:

DROP TABLE IF EXISTS something_is_wrong_here;

CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY,
value numeric(15,4) DEFAULT 0 NOT NULL);

INSERT INTO something_is_wrong_here (value) (SELECT 10000 from
generate_series(0, 100));

SELECT txid_current() \gset

SELECT :txid_current + 1 as txid \gset

ALTER SYSTEM SET vacuum_defer_cleanup_age to :txid;SELECT
pg_reload_conf();

I have attached some scripts if someone goes to reproduce.

Best regards,
Michail.

Attachments:

reproduce.shapplication/x-shellscript; name=reproduce.shDownload
reproduce.benchapplication/octet-stream; name=reproduce.benchDownload
scheme.sqlapplication/sql; name=scheme.sqlDownload
#8Andres Freund
andres@anarazel.de
In reply to: Michail Nikolaev (#2)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

Thomas, CCing you because of the 64bit xid representation aspect.

On 2023-01-06 00:39:44 +0300, Michail Nikolaev wrote:

I apologize for the direct ping, but I think your snapshot scalability
work in PG14 could be related to the issue.

Good call!

The TransactionIdRetreatedBy implementation looks correct... But with
txid_current=212195 I see errors like "could not access status of
transaction 58643736"...
So, maybe vacuum_defer_cleanup_age just highlights some special case
(something with "previous" xids on the left side of zero?)....

I think the bug is close to TransactionIdRetreatedBy(). Arguably in
FullXidRelativeTo(). Or a more fundamental data representation issue with
64bit xids.

To explain, here's a trace to the bottom of GetSnapshotData() leading to the
problem:

In the case I'm looking at here we end up with 720:
oldestfxid = FullXidRelativeTo(latest_completed, oldestxid);
and xmin is 255271, both correct.

Then in TransactionIdRetreatedBy:
/* apply vacuum_defer_cleanup_age */
def_vis_xid_data =
TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age);

things start to be iffy. Because we retreat by vacuum_defer_cleanup_age, which
was set to txid_current() in scheme.sql, and the xmin above is that xid,
TransactionIdRetreatedBy() first ends up with 0. It then backtracks further to
the highest 32 xid (i.e. 4294967295). So far so good.

We could obviously end up with values further in the past as well, if
vacuum_defer_cleanup_age were larger.

Things start to seriously go off the rails when we convert that 32bit xid to
64 bit with:
def_vis_fxid = FullXidRelativeTo(latest_completed, def_vis_xid);
which returns {value = 18446744073709551615}, which 0-1 in 64bit.

However, as 64bit xids are not supposed to wrap around, we're in trouble -
it's an xid *very* far into the future. Allowing things to be pruned that
shouldn't, because everything is below that.

I don't quite know how to best fix this. 4294967295 is the correct result for
TransactionIdRetreatedBy() in this case, and it'd not cause problems for
FullXidRelativeTo() if we actually had wrapped around the xid counter before
(since then we'd just represent it as a fxid "of the proper epoch").

Basically, in contrast to 32bit xids, 64bit xids cannot represent xids from
before the start of the universe, whereas with the modulo arithmetic of 32bit
that's not a real problem.

It's probably not too hard to fix specifically in this one place - we could
just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
suspect this might not be the only place running into problems with such
"before the universe" xids.

For a bit I was thinking that we should introduce the notion that a
FullTransactionId can be from the past. Specifically that when the upper 32bit
are all set, we treat the lower 32bit as a value from before xid 0 using the
normal 32bit xid arithmetic. But it sucks to add overhead for that
everywhere.

It might be a bit more palatable to designate one individual value,
e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
before the start of the universe an xid point to...

FullXidRelativeTo() did assert that the input 32bit xid is in a reasonable
range, but unfortunately didn't do a similar check for the 64bit case.

Greetings,

Andres Freund

#9Andres Freund
andres@anarazel.de
In reply to: Michail Nikolaev (#7)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-07 21:06:06 +0300, Michail Nikolaev wrote:

2) It is not an issue at table creation time. Issue is reproducible if
vacuum_defer_cleanup_age set after table preparation.

3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid
over zero (be >= txid_current()).
And it is stable.... So, for example - unable to reproduce with 733
value, but 734 gives error each time.
Just a single additional txid_current() (after data is filled) fixes a
crash... It looks like the first SELECT FOR UPDATE + UPDATE silently
poisons everything somehow.
You could use such PSQL script:

FWIW, the concrete value for vacuum_defer_cleanup_age is not crucial to
encounter the problem. It needs to be a value that, when compared to the xid
that did the "INSERT INTO something_is_wrong_here", results in value <= 0.

Setting vacuum_defer_cleanup_age than the xid to a much larger value allows
the crash to be encountered repeatedly.

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#8)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-07 16:29:23 -0800, Andres Freund wrote:

It's probably not too hard to fix specifically in this one place - we could
just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
suspect this might not be the only place running into problems with such
"before the universe" xids.

I haven't found other problematic places in HEAD, but did end up find a less
serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify
that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns
values that look likely to cause problems. Its "just" used in gist luckily.

It's hard to find places that do this kind of arithmetic, we traditionally
haven't had a helper for it. So it's open-coded in various ways.

xidStopLimit = xidWrapLimit - 3000000;
if (xidStopLimit < FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;

and oddly:
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;
if (xidVacLimit < FirstNormalTransactionId)
xidVacLimit += FirstNormalTransactionId;

or (in < 14):

RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
if (!TransactionIdIsNormal(RecentGlobalXmin))
RecentGlobalXmin = FirstNormalTransactionId;

The currently existing places I found, other than the ones in procarray.c,
luckily don't seem to convert the xids to 64bit xids.

For a bit I was thinking that we should introduce the notion that a
FullTransactionId can be from the past. Specifically that when the upper 32bit
are all set, we treat the lower 32bit as a value from before xid 0 using the
normal 32bit xid arithmetic. But it sucks to add overhead for that
everywhere.

It might be a bit more palatable to designate one individual value,
e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
before the start of the universe an xid point to...

On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
hacked up a patch that converts various fxid functions to inline functions
with such assertions, and it indeed quickly catches the problem this thread
reported, close to the source of the use.

One issue with that is is that it'd reduce what can be input for the xid8
type. But it's hard to believe that'd be a real issue?

It's quite unfortunate that we don't have a test for vacuum_defer_cleanup_age
yet :(.

Greetings,

Andres Freund

#11Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#10)
1 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Sun, 8 Jan 2023 at 04:09, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-01-07 16:29:23 -0800, Andres Freund wrote:

It's probably not too hard to fix specifically in this one place - we could
just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
suspect this might not be the only place running into problems with such
"before the universe" xids.

I haven't found other problematic places in HEAD, but did end up find a less
serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify
that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns
values that look likely to cause problems. Its "just" used in gist luckily.

It's hard to find places that do this kind of arithmetic, we traditionally
haven't had a helper for it. So it's open-coded in various ways.
[...]

The currently existing places I found, other than the ones in procarray.c,
luckily don't seem to convert the xids to 64bit xids.

That's good to know.

For a bit I was thinking that we should introduce the notion that a
FullTransactionId can be from the past. Specifically that when the upper 32bit
are all set, we treat the lower 32bit as a value from before xid 0 using the
normal 32bit xid arithmetic. But it sucks to add overhead for that
everywhere.

It might be a bit more palatable to designate one individual value,
e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
before the start of the universe an xid point to...

On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
hacked up a patch that converts various fxid functions to inline functions
with such assertions, and it indeed quickly catches the problem this thread
reported, close to the source of the use.

Wouldn't it be enough to only fix the constructions in
FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
not occur with the patch), and (optionally) bump the first XID
available for any cluster to (FirstNormalXid + 1) to retain the 'older
than any running transaction' property?

The change only fixes the issue for FullTransactionId, which IMO is OK
- I don't think we need to keep xid->xid8->xid symmetric in cases of
xid8 wraparound.

One issue with that is is that it'd reduce what can be input for the xid8
type. But it's hard to believe that'd be a real issue?

Yes, it's unlikely anyone would ever hit that with our current WAL
format - we use 24 bytes /xid just to log it's use, so we'd use at
most epoch 0x1000_0000 in unrealistic scenarios. In addition;
technically, we already have (3*2^32 - 3) "invalid" xid8 values that
can never be produced in FullXidRelativeTo - those few extra invalid
values don't matter much to me except "even more special casing".

Kind regards,

Matthias van de Meent.

Attachments:

v1-0001-Prevent-underflow-of-xid8-epoch.patchapplication/octet-stream; name=v1-0001-Prevent-underflow-of-xid8-epoch.patchDownload
From ea37a15e424385d76a5976ef14fc24174e738bee Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Mon, 9 Jan 2023 17:34:01 +0100
Subject: [PATCH v1] Prevent underflow of xid8 epoch

In some cases where we use xid offsets, it is possible to create a
xid4 that is logically "before" the first transaction ID. With
xid4, that is not much of an issue because we allow them to over-
and underflow as long as the values stay within 2^31 of any active
transaction, but this is not the case for xid8: xid8 assumes a
strictly increasing counter, which disallows under- and overflows.

To still correctly convert xid to xid8, an underflow of the epoch is
handled by (depending on context) truncating the value to either
FirstNormalFullTransactionId or InvalidFullTransactionId.
---
 src/backend/replication/walreceiver.c | 21 +++++++++++++++++++--
 src/backend/storage/ipc/procarray.c   | 11 +++++++++++
 src/backend/utils/adt/xid8funcs.c     | 12 ++++++++++++
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 3876c0188d..5710a42ad6 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1233,10 +1233,27 @@ XLogWalRcvSendHSFeedback(bool immed)
 	nextXid = XidFromFullTransactionId(nextFullXid);
 	xmin_epoch = EpochFromFullTransactionId(nextFullXid);
 	catalog_xmin_epoch = xmin_epoch;
+
+	/*
+	 * If we aren't careful, we can allow the epoch to underflow, resulting
+	 * in all kinds of bad behaviour due to breaking the xid8 sequentiality
+	 * guarantee. So, instead of letting the epoch wrap around, we set the
+	 * xmin to the lowest possible value.
+	 */
 	if (nextXid < xmin)
-		xmin_epoch--;
+	{
+		if (xmin_epoch == 0)
+			xmin = InvalidTransactionId;
+		else
+			--xmin_epoch;
+	}
 	if (nextXid < catalog_xmin)
-		catalog_xmin_epoch--;
+	{
+		if (catalog_xmin_epoch == 0)
+			catalog_xmin = InvalidTransactionId;
+		else
+			catalog_xmin_epoch--;
+	}
 
 	elog(DEBUG2, "sending hot standby feedback xmin %u epoch %u catalog_xmin %u catalog_xmin_epoch %u",
 		 xmin, xmin_epoch, catalog_xmin, catalog_xmin_epoch);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf9641..f83ccacf03 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4305,6 +4305,9 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
  * held by the backend and xid is from a table (where vacuum/freezing ensures
  * the xid has to be within that range), or if xid is from the procarray and
  * prevents xid wraparound that way.
+ *
+ * The function returns FirstNormalTransactionId if provided with a xid
+ * that would make the epoch underflow and wrap around.
  */
 static inline FullTransactionId
 FullXidRelativeTo(FullTransactionId rel, TransactionId xid)
@@ -4317,6 +4320,14 @@ FullXidRelativeTo(FullTransactionId rel, TransactionId xid)
 	/* not guaranteed to find issues, but likely to catch mistakes */
 	AssertTransactionIdInAllowableRange(xid);
 
+	/*
+	 * An epoch of 0 would underflow on this condition and break the
+	 * 'xid8 does not wrap around' guarantee.
+	 */
+	if (unlikely(EpochFromFullTransactionId(rel) == 0 &&
+				 ((int32) (xid - rel_xid)) < 0))
+		return FirstNormalFullTransactionId;
+
 	return FullTransactionIdFromU64(U64FromFullTransactionId(rel)
 									+ (int32) (xid - rel_xid));
 }
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index e616303a29..55fb838330 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -153,6 +153,9 @@ TransactionIdInRecentPast(FullTransactionId fxid, TransactionId *extracted_xid)
  * FullTransactionId.  Use next_fxid as a reference FullTransactionId, so that
  * we can compute the high order bits.  It must have been obtained by the
  * caller with ReadNextFullTransactionId() after the snapshot was created.
+ *
+ * The function returns FirstNormalTransactionId if provided with a xid
+ * that would make the epoch underflow and wrap around.
  */
 static FullTransactionId
 widen_snapshot_xid(TransactionId xid, FullTransactionId next_fxid)
@@ -172,7 +175,16 @@ widen_snapshot_xid(TransactionId xid, FullTransactionId next_fxid)
 	 * more than one epoch ahead of the TransactionIds in any snapshot.
 	 */
 	if (xid > next_xid)
+	{
+		/*
+		 * We may not allow the epoch to underflow, so instead we'll output
+		 * FirstNormalFullTransactionId as a substitute.
+		 */
+		if (epoch == 0)
+			return FirstNormalFullTransactionId;
+
 		epoch--;
+	}
 
 	return FullTransactionIdFromEpochAndXid(epoch, xid);
 }
-- 
2.30.2

#12Andres Freund
andres@anarazel.de
In reply to: Matthias van de Meent (#11)
3 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

Robert, Mark, CCing you because of the amcheck aspect (see below).

On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote:

On Sun, 8 Jan 2023 at 04:09, Andres Freund <andres@anarazel.de> wrote:

For a bit I was thinking that we should introduce the notion that a
FullTransactionId can be from the past. Specifically that when the upper 32bit
are all set, we treat the lower 32bit as a value from before xid 0 using the
normal 32bit xid arithmetic. But it sucks to add overhead for that
everywhere.

It might be a bit more palatable to designate one individual value,
e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
before the start of the universe an xid point to...

On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
hacked up a patch that converts various fxid functions to inline functions
with such assertions, and it indeed quickly catches the problem this thread
reported, close to the source of the use.

Wouldn't it be enough to only fix the constructions in
FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
not occur with the patch), and (optionally) bump the first XID
available for any cluster to (FirstNormalXid + 1) to retain the 'older
than any running transaction' property?

It's not too hard to fix in individual places, but I suspect that we'll
introduce the bug in future places without some more fundamental protection.

Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value
in ComputeXidHorizons() and GetSnapshotData().

Fixing it in FullXidRelativeTo() doesn't seem quite right, while it's ok to
just return FirstNormalTransactionId in the case of vacuum_defer_cleanup_age,
it doesn't see necessarily correct for other cases.

The change only fixes the issue for FullTransactionId, which IMO is OK
- I don't think we need to keep xid->xid8->xid symmetric in cases of
xid8 wraparound.

I think we should keep that symmetric, it just gets too confusing / easy to
miss bugs otherwise.

One issue with that is is that it'd reduce what can be input for the xid8
type. But it's hard to believe that'd be a real issue?

Yes, it's unlikely anyone would ever hit that with our current WAL
format - we use 24 bytes /xid just to log it's use, so we'd use at
most epoch 0x1000_0000 in unrealistic scenarios. In addition;
technically, we already have (3*2^32 - 3) "invalid" xid8 values that
can never be produced in FullXidRelativeTo - those few extra invalid
values don't matter much to me except "even more special casing".

Yep. The attached 0002 is a first implementation of this.

The new assertions found at least one bug in amcheck, and one further example
of the problem of representing past 32 xids in 64bit:

1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
subsequent calls) to determine whether epoch needs to be reduced.

2) One test generates includes an xid from the future (4026531839). Which
causes epoch to wrap around (via the epoch--) in
FullTransactionIdFromXidAndCtx(). I've hackily fixed that by just representing
it as an xid from the future instead. But not sure that's a good answer.

A different approach would be to represent fxids as *signed* 64bit
integers. That'd of course loose more range, but could represent everything
accurately, and would have a compatible on-disk representation on two's
complement platforms (all our platforms). I think the only place that'd need
special treatment is U64FromFullTransactionId() / its callers. I think this
might be the most robust approach.

Greetings,

Andres Freund

Attachments:

v2-0001-Fix-corruption-due-to-vacuum_defer_cleanup_age-un.patchtext/x-diff; charset=us-asciiDownload
From 13450d14a1e064dd958b516ffa36b7a24bb49d7b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 10:23:10 -0800
Subject: [PATCH v2 1/3] Fix corruption due to vacuum_defer_cleanup_age
 underflowing 64bit xids

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch:
---
 src/backend/storage/ipc/procarray.c | 73 ++++++++++++++++++++++++-----
 1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf96416..d7d18ba8c12 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -367,6 +367,7 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId l
 static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
 static void MaintainLatestCompletedXid(TransactionId latestXid);
 static void MaintainLatestCompletedXidRecovery(TransactionId latestXid);
+static int ClampVacuumDeferCleanupAge(FullTransactionId rel, TransactionId xid);
 
 static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel,
 												  TransactionId xid);
@@ -1888,17 +1889,32 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * so guc.c should limit it to no more than the xidStopLimit threshold
 		 * in varsup.c.  Also note that we intentionally don't apply
 		 * vacuum_defer_cleanup_age on standby servers.
+		 *
+		 * Be careful to clamp vacuum_defer_cleanup age to prevent it from
+		 * creating an xid before FirstNormalTransactionId.
 		 */
-		h->oldest_considered_running =
-			TransactionIdRetreatedBy(h->oldest_considered_running,
-									 vacuum_defer_cleanup_age);
-		h->shared_oldest_nonremovable =
-			TransactionIdRetreatedBy(h->shared_oldest_nonremovable,
-									 vacuum_defer_cleanup_age);
-		h->data_oldest_nonremovable =
-			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
-									 vacuum_defer_cleanup_age);
-		/* defer doesn't apply to temp relations */
+		Assert(TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+											 h->shared_oldest_nonremovable));
+		Assert(TransactionIdPrecedesOrEquals(h->shared_oldest_nonremovable,
+											 h->data_oldest_nonremovable));
+
+		if (vacuum_defer_cleanup_age > 0)
+		{
+			int clamped_cleanup_age =
+				ClampVacuumDeferCleanupAge(h->latest_completed,
+										   h->oldest_considered_running);
+
+			h->oldest_considered_running =
+				TransactionIdRetreatedBy(h->oldest_considered_running,
+										 clamped_cleanup_age);
+			h->shared_oldest_nonremovable =
+				TransactionIdRetreatedBy(h->shared_oldest_nonremovable,
+										 clamped_cleanup_age);
+			h->data_oldest_nonremovable =
+				TransactionIdRetreatedBy(h->data_oldest_nonremovable,
+										 clamped_cleanup_age);
+			/* defer doesn't apply to temp relations */
+		}
 	}
 
 	/*
@@ -2469,9 +2485,17 @@ GetSnapshotData(Snapshot snapshot)
 		 */
 		oldestfxid = FullXidRelativeTo(latest_completed, oldestxid);
 
+		def_vis_xid_data = xmin;
+
 		/* apply vacuum_defer_cleanup_age */
-		def_vis_xid_data =
-			TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age);
+		if (vacuum_defer_cleanup_age > 0)
+		{
+			int clamped_cleanup_age =
+				ClampVacuumDeferCleanupAge(oldestfxid, oldestxid);
+
+			def_vis_xid_data =
+				TransactionIdRetreatedBy(xmin, clamped_cleanup_age);
+		}
 
 		/* Check whether there's a replication slot requiring an older xmin. */
 		def_vis_xid_data =
@@ -4295,6 +4319,31 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 	return GlobalVisTestIsRemovableXid(state, xid);
 }
 
+/*
+ * Clamp vacuum_defer_cleanup_age, to prevent it from retreating below
+ * FirstNormalTransactionId during epoch 0. This is important to prevent
+ * generating xids that cannot be converted to a FullTransactionId without
+ * wrapping around.
+ */
+static int
+ClampVacuumDeferCleanupAge(FullTransactionId rel, TransactionId oldest_xid)
+{
+	FullTransactionId oldest_fxid;
+	uint64 oldest_fxid_i;
+
+	if (vacuum_defer_cleanup_age == 0)
+		return 0;
+
+	oldest_fxid = FullXidRelativeTo(rel, oldest_xid);
+	oldest_fxid_i = U64FromFullTransactionId(oldest_fxid);
+
+	if (oldest_fxid_i < vacuum_defer_cleanup_age ||
+		(oldest_fxid_i - vacuum_defer_cleanup_age) < FirstNormalTransactionId)
+		return (int) (oldest_fxid_i - FirstNormalTransactionId);
+
+	return vacuum_defer_cleanup_age;
+}
+
 /*
  * Convert a 32 bit transaction id into 64 bit transaction id, by assuming it
  * is within MaxTransactionId / 2 of XidFromFullTransactionId(rel).
-- 
2.38.0

v2-0002-wip-amcheck-Fix-bugs-relating-to-64bit-xids.patchtext/x-diff; charset=us-asciiDownload
From 1b85be918858172cec0c2884169e79576608ed3f Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 11:24:37 -0800
Subject: [PATCH v2 2/3] wip: amcheck: Fix bugs relating to 64bit xids

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
---
 contrib/amcheck/verify_heapam.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 4fcfd6df72d..56a3b3e9974 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1576,11 +1576,25 @@ FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
 {
 	uint32		epoch;
 
+	Assert(TransactionIdIsValid(ctx->next_xid));
+	Assert(FullTransactionIdIsValid(ctx->next_fxid));
+
 	if (!TransactionIdIsNormal(xid))
 		return FullTransactionIdFromEpochAndXid(0, xid);
 	epoch = EpochFromFullTransactionId(ctx->next_fxid);
 	if (xid > ctx->next_xid)
+	{
+		/*
+		 * FIXME, doubtful this is the best fix.
+		 *
+		 * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
+		 * 0. Represent it as an xid from the future instead.
+		 */
+		if (epoch == 0)
+			return FullTransactionIdFromEpochAndXid(0, xid);
+
 		epoch--;
+	}
 	return FullTransactionIdFromEpochAndXid(epoch, xid);
 }
 
@@ -1597,8 +1611,8 @@ update_cached_xid_range(HeapCheckContext *ctx)
 	LWLockRelease(XidGenLock);
 
 	/* And compute alternate versions of the same */
-	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 	ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 }
 
 /*
-- 
2.38.0

v2-0003-wip-stricter-validation-for-64bit-xids.patchtext/x-diff; charset=us-asciiDownload
From 2fccf780a517416283ad739acb6df4cbcf2f2b28 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 11:25:43 -0800
Subject: [PATCH v2 3/3] wip: stricter validation for 64bit xids

64bit xids can't represent xids before xid 0, but 32bit xids can. This has
lead to a number of bugs, some data corrupting. To make it easier to catch
such bugs, disallow 64bit where the upper 32bit are all set, which is what one
gets when naively trying to convert such 32bit xids. Additionally explicitly
disallow 64bit xids that are already aren't allowed because they'd map to
special 32bit xids.

This changes the input routines for 64bit xids to error out in more cases.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
---
 src/include/access/transam.h      | 136 ++++++++++++++++++++++++++++--
 src/backend/utils/adt/xid.c       |   7 ++
 src/backend/utils/adt/xid8funcs.c |  17 ++--
 src/test/regress/expected/xid.out |  44 ++++++++--
 src/test/regress/sql/xid.sql      |  12 ++-
 5 files changed, 192 insertions(+), 24 deletions(-)

diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d30556..0d4bf57e63a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -44,15 +44,6 @@
 #define TransactionIdStore(xid, dest)	(*(dest) = (xid))
 #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)
 
-#define EpochFromFullTransactionId(x)	((uint32) ((x).value >> 32))
-#define XidFromFullTransactionId(x)		((uint32) (x).value)
-#define U64FromFullTransactionId(x)		((x).value)
-#define FullTransactionIdEquals(a, b)	((a).value == (b).value)
-#define FullTransactionIdPrecedes(a, b)	((a).value < (b).value)
-#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value)
-#define FullTransactionIdFollows(a, b) ((a).value > (b).value)
-#define FullTransactionIdFollowsOrEquals(a, b) ((a).value >= (b).value)
-#define FullTransactionIdIsValid(x)		TransactionIdIsValid(XidFromFullTransactionId(x))
 #define InvalidFullTransactionId		FullTransactionIdFromEpochAndXid(0, InvalidTransactionId)
 #define FirstNormalFullTransactionId	FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId)
 #define FullTransactionIdIsNormal(x)	FullTransactionIdFollowsOrEquals(x, FirstNormalFullTransactionId)
@@ -61,12 +52,127 @@
  * A 64 bit value that contains an epoch and a TransactionId.  This is
  * wrapped in a struct to prevent implicit conversion to/from TransactionId.
  * Not all values represent valid normal XIDs.
+ *
+ * 64bit xids that would map to a non-normal 32bit xid are not allowed, for
+ * obvious reasons.
+
+ * In addition, no xids with all the upper 32bit sets are allowed to exist,
+ * this is to make it easier to catch conversion errors from 32bit xids (which
+ * can point to "before" xid 0).
  */
 typedef struct FullTransactionId
 {
 	uint64		value;
 } FullTransactionId;
 
+#define MAX_FULL_TRANSACTION_ID ((((uint64)~(uint32)0)<<32) - 1)
+
+/*
+ * This is separate from FullTransactionIdValidRange() so that input routines
+ * can check for invalid values without triggering an assert.
+ */
+static inline bool
+InFullTransactionIdRange(uint64 val)
+{
+	/* 64bit xid above the upper bound */
+	if (val > MAX_FULL_TRANSACTION_ID)
+		return false;
+
+	/*
+	 * normal 64bit xids that'd map to special 32 xids aren't allowed.
+	 */
+	if (val >= (uint64) FirstNormalTransactionId &&
+		(uint32) val < FirstNormalTransactionId)
+		return false;
+	return true;
+}
+
+/*
+ * Validation routine, typically used in asserts.
+ */
+static inline bool
+FullTransactionIdValidRange(FullTransactionId x)
+{
+	return InFullTransactionIdRange(x.value);
+}
+
+static inline uint64
+U64FromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return x.value;
+}
+
+static inline TransactionId
+XidFromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return (TransactionId) x.value;
+}
+
+static inline uint32
+EpochFromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return (uint32) (x.value >> 32);
+}
+
+static inline bool
+FullTransactionIdEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value == b.value;
+}
+
+static inline bool
+FullTransactionIdPrecedes(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value < b.value;
+}
+
+static inline bool
+FullTransactionIdPrecedesOrEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value <= b.value;
+}
+
+static inline bool
+FullTransactionIdFollows(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value > b.value;
+}
+
+static inline bool
+FullTransactionIdFollowsOrEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value >= b.value;
+}
+
+static inline bool
+FullTransactionIdIsValid(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return x.value != (uint64) InvalidTransactionId;
+}
+
 static inline FullTransactionId
 FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid)
 {
@@ -74,6 +180,8 @@ FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid)
 
 	result.value = ((uint64) epoch) << 32 | xid;
 
+	Assert(FullTransactionIdValidRange(result));
+
 	return result;
 }
 
@@ -84,6 +192,8 @@ FullTransactionIdFromU64(uint64 value)
 
 	result.value = value;
 
+	Assert(FullTransactionIdValidRange(result));
+
 	return result;
 }
 
@@ -102,6 +212,8 @@ FullTransactionIdFromU64(uint64 value)
 static inline void
 FullTransactionIdRetreat(FullTransactionId *dest)
 {
+	Assert(FullTransactionIdValidRange(*dest));
+
 	dest->value--;
 
 	/*
@@ -118,6 +230,8 @@ FullTransactionIdRetreat(FullTransactionId *dest)
 	 */
 	while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 		dest->value--;
+
+	Assert(FullTransactionIdValidRange(*dest));
 }
 
 /*
@@ -127,6 +241,8 @@ FullTransactionIdRetreat(FullTransactionId *dest)
 static inline void
 FullTransactionIdAdvance(FullTransactionId *dest)
 {
+	Assert(FullTransactionIdValidRange(*dest));
+
 	dest->value++;
 
 	/* see FullTransactionIdAdvance() */
@@ -135,6 +251,8 @@ FullTransactionIdAdvance(FullTransactionId *dest)
 
 	while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 		dest->value++;
+
+	Assert(FullTransactionIdValidRange(*dest));
 }
 
 /* back up a transaction ID variable, handling wraparound correctly */
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 8ac1679c381..21c0f4c67df 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -188,6 +188,13 @@ xid8in(PG_FUNCTION_ARGS)
 	uint64		result;
 
 	result = uint64in_subr(str, NULL, "xid8", fcinfo->context);
+
+	if (!InFullTransactionIdRange(result))
+		ereturn(fcinfo->context, 0,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("value \"%s\" is out of range for type %s",
+						str, "xid8")));
+
 	PG_RETURN_FULLTRANSACTIONID(FullTransactionIdFromU64(result));
 }
 
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index e616303a292..b8d8d92f793 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -302,15 +302,18 @@ parse_snapshot(const char *str, Node *escontext)
 	const char *str_start = str;
 	char	   *endp;
 	StringInfo	buf;
+	uint64 raw_fxid;
 
-	xmin = FullTransactionIdFromU64(strtou64(str, &endp, 10));
-	if (*endp != ':')
+	raw_fxid = strtou64(str, &endp, 10);
+	if (*endp != ':' || !InFullTransactionIdRange(raw_fxid))
 		goto bad_format;
+	xmin = FullTransactionIdFromU64(raw_fxid);
 	str = endp + 1;
 
-	xmax = FullTransactionIdFromU64(strtou64(str, &endp, 10));
-	if (*endp != ':')
+	raw_fxid = strtou64(str, &endp, 10);
+	if (*endp != ':' || !InFullTransactionIdRange(raw_fxid))
 		goto bad_format;
+	xmax = FullTransactionIdFromU64(raw_fxid);
 	str = endp + 1;
 
 	/* it should look sane */
@@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
 	while (*str != '\0')
 	{
 		/* read next value */
-		val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
+		raw_fxid = strtou64(str, &endp, 10);
+
+		val = FullTransactionIdFromU64(raw_fxid);
+		if (!InFullTransactionIdRange(raw_fxid))
+			goto bad_format;
 		str = endp;
 
 		/* require the input to be in order */
diff --git a/src/test/regress/expected/xid.out b/src/test/regress/expected/xid.out
index e62f7019434..4348eb66bd5 100644
--- a/src/test/regress/expected/xid.out
+++ b/src/test/regress/expected/xid.out
@@ -6,11 +6,11 @@ select '010'::xid,
        '-1'::xid,
 	   '010'::xid8,
 	   '42'::xid8,
-	   '0xffffffffffffffff'::xid8,
-	   '-1'::xid8;
- xid | xid |    xid     |    xid     | xid8 | xid8 |         xid8         |         xid8         
------+-----+------------+------------+------+------+----------------------+----------------------
-   8 |  42 | 4294967295 | 4294967295 |    8 |   42 | 18446744073709551615 | 18446744073709551615
+	   '0xefffffffffffffff'::xid8,
+	   '0'::xid8;
+ xid | xid |    xid     |    xid     | xid8 | xid8 |         xid8         | xid8 
+-----+-----+------------+------------+------+------+----------------------+------
+   8 |  42 | 4294967295 | 4294967295 |    8 |   42 | 17293822569102704639 |    0
 (1 row)
 
 -- garbage values
@@ -30,6 +30,18 @@ select 'asdf'::xid8;
 ERROR:  invalid input syntax for type xid8: "asdf"
 LINE 1: select 'asdf'::xid8;
                ^
+select '-1'::xid8;
+ERROR:  value "-1" is out of range for type xid8
+LINE 1: select '-1'::xid8;
+               ^
+select '0xffffffffffffffff'::xid8;
+ERROR:  value "0xffffffffffffffff" is out of range for type xid8
+LINE 1: select '0xffffffffffffffff'::xid8;
+               ^
+select '0x0000300000000001'::xid8;
+ERROR:  value "0x0000300000000001" is out of range for type xid8
+LINE 1: select '0x0000300000000001'::xid8;
+               ^
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('42', 'xid');
  pg_input_is_valid 
@@ -67,6 +79,24 @@ SELECT pg_input_error_message('0xffffffffffffffffffff', 'xid8');
  value "0xffffffffffffffffffff" is out of range for type xid8
 (1 row)
 
+SELECT pg_input_is_valid('-1', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
+SELECT pg_input_is_valid('0xffffffffffffffff', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
+SELECT pg_input_is_valid('0x0000300000000001', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
 -- equality
 select '1'::xid = '1'::xid;
  ?column? 
@@ -160,11 +190,11 @@ select xid8cmp('1', '2'), xid8cmp('2', '2'), xid8cmp('2', '1');
 
 -- min() and max() for xid8
 create table xid8_t1 (x xid8);
-insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xffffffffffffffff'), ('-1');
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xefffffffffffffff');
 select min(x), max(x) from xid8_t1;
  min |         max          
 -----+----------------------
-   0 | 18446744073709551615
+   0 | 17293822569102704639
 (1 row)
 
 -- xid8 has btree and hash opclasses
diff --git a/src/test/regress/sql/xid.sql b/src/test/regress/sql/xid.sql
index b6996588ef6..b2104e4ce20 100644
--- a/src/test/regress/sql/xid.sql
+++ b/src/test/regress/sql/xid.sql
@@ -7,14 +7,17 @@ select '010'::xid,
        '-1'::xid,
 	   '010'::xid8,
 	   '42'::xid8,
-	   '0xffffffffffffffff'::xid8,
-	   '-1'::xid8;
+	   '0xefffffffffffffff'::xid8,
+	   '0'::xid8;
 
 -- garbage values
 select ''::xid;
 select 'asdf'::xid;
 select ''::xid8;
 select 'asdf'::xid8;
+select '-1'::xid8;
+select '0xffffffffffffffff'::xid8;
+select '0x0000300000000001'::xid8;
 
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('42', 'xid');
@@ -23,6 +26,9 @@ SELECT pg_input_error_message('0xffffffffff', 'xid');
 SELECT pg_input_is_valid('42', 'xid8');
 SELECT pg_input_is_valid('asdf', 'xid8');
 SELECT pg_input_error_message('0xffffffffffffffffffff', 'xid8');
+SELECT pg_input_is_valid('-1', 'xid8');
+SELECT pg_input_is_valid('0xffffffffffffffff', 'xid8');
+SELECT pg_input_is_valid('0x0000300000000001', 'xid8');
 
 -- equality
 select '1'::xid = '1'::xid;
@@ -51,7 +57,7 @@ select xid8cmp('1', '2'), xid8cmp('2', '2'), xid8cmp('2', '1');
 
 -- min() and max() for xid8
 create table xid8_t1 (x xid8);
-insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xffffffffffffffff'), ('-1');
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xefffffffffffffff');
 select min(x), max(x) from xid8_t1;
 
 -- xid8 has btree and hash opclasses
-- 
2.38.0

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#12)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Tue, Jan 10, 2023 at 8:34 AM Andres Freund <andres@anarazel.de> wrote:

A different approach would be to represent fxids as *signed* 64bit
integers. That'd of course loose more range, but could represent everything
accurately, and would have a compatible on-disk representation on two's
complement platforms (all our platforms). I think the only place that'd need
special treatment is U64FromFullTransactionId() / its callers. I think this
might be the most robust approach.

It does sound like an interesting approach; it means you are free to
retreat arbitrarily without ever thinking about it, and by the
arguments given (LSN space required to consume fxids) it's still
'enough'. Essentially all these bugs are places where the author
already believed it worked that way.

(Two's complement is required in the C23 draft.)

#14Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#12)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Jan 9, 2023, at 11:34 AM, Andres Freund <andres@anarazel.de> wrote:

1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
subsequent calls) to determine whether epoch needs to be reduced.

Can you say a bit more about your analysis here, preferably with pointers to the lines of code you are analyzing? Does the problem exist in amcheck as currently committed, or are you thinking about a problem that arises only after applying your patch? I'm a bit fuzzy on where xid 0 gets used.

Thanks


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#14)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-09 13:55:02 -0800, Mark Dilger wrote:

On Jan 9, 2023, at 11:34 AM, Andres Freund <andres@anarazel.de> wrote:

1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
subsequent calls) to determine whether epoch needs to be reduced.

Can you say a bit more about your analysis here, preferably with pointers to
the lines of code you are analyzing? Does the problem exist in amcheck as
currently committed, or are you thinking about a problem that arises only
after applying your patch? I'm a bit fuzzy on where xid 0 gets used.

The problems exist in the code as currently committed. I'm not sure what
exactly the consequences are, the result is that oldest_fxid will be, at least
temporarily, bogus.

Consider the first call to update_cached_xid_range():

/*
* Update our cached range of valid transaction IDs.
*/
static void
update_cached_xid_range(HeapCheckContext *ctx)
{
/* Make cached copies */
LWLockAcquire(XidGenLock, LW_SHARED);
ctx->next_fxid = ShmemVariableCache->nextXid;
ctx->oldest_xid = ShmemVariableCache->oldestXid;
LWLockRelease(XidGenLock);

/* And compute alternate versions of the same */
ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
}

The problem is that the call to FullTransactionIdFromXidAndCtx() happens
before ctx->next_xid is assigned, even though FullTransactionIdFromXidAndCtx()
uses ctx->next_xid.

static FullTransactionId
FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
{
uint32 epoch;

if (!TransactionIdIsNormal(xid))
return FullTransactionIdFromEpochAndXid(0, xid);
epoch = EpochFromFullTransactionId(ctx->next_fxid);
if (xid > ctx->next_xid)
epoch--;
return FullTransactionIdFromEpochAndXid(epoch, xid);
}

Because ctx->next_xid is 0, due to not having been set yet, "xid > ctx->next_xid"
will always be true, leading to epoch being reduced by one.

In the common case of there never having been an xid wraparound, we'll thus
underflow epoch, generating an xid far into the future.

The tests encounter the issue today. If you add
Assert(TransactionIdIsValid(ctx->next_xid));
Assert(FullTransactionIdIsValid(ctx->next_fxid));
early in FullTransactionIdFromXidAndCtx() it'll be hit in the
amcheck/pg_amcheck tests.

Greetings,

Andres Freund

#16Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#15)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Jan 9, 2023, at 2:07 PM, Andres Freund <andres@anarazel.de> wrote:

The tests encounter the issue today. If you add
Assert(TransactionIdIsValid(ctx->next_xid));
Assert(FullTransactionIdIsValid(ctx->next_fxid));
early in FullTransactionIdFromXidAndCtx() it'll be hit in the
amcheck/pg_amcheck tests.

Ok, I can confirm that. I find the assertion

Assert(epoch != (uint32)-1);

a bit simpler to reason about, but either way, I agree it is a bug. Thanks for finding this.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#12)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Mon, 9 Jan 2023 at 20:34, Andres Freund <andres@anarazel.de> wrote:

On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote:

Wouldn't it be enough to only fix the constructions in
FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
not occur with the patch), and (optionally) bump the first XID
available for any cluster to (FirstNormalXid + 1) to retain the 'older
than any running transaction' property?

It's not too hard to fix in individual places, but I suspect that we'll
introduce the bug in future places without some more fundamental protection.

Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value
in ComputeXidHorizons() and GetSnapshotData().

I don't think that clamping the value with oldestXid (as seen in patch
0001, in GetSnapshotData) is right.
It would clamp the value relative to the oldest frozen xid of all
databases, which can be millions of transactions behind oldestXmin,
and thus severely skew the amount of transaction's changes you keep on
disk (that is, until oldestXid moves past 1000_000).
A similar case can be made for the changes in ComputeXidHorizons - for
the described behaviour of vacuum_defer_cleanup_age we would need to
clamp the used offset separately for each of the fields in the horizon
result to retain all transaction data for the first 1 million
transactions, and the ones that may still see these transactions.

Fixing it in FullXidRelativeTo() doesn't seem quite right, while it's ok to
just return FirstNormalTransactionId in the case of vacuum_defer_cleanup_age,
it doesn't see necessarily correct for other cases.

Understood.

Kind regards,

Matthias van de Meent

#18Andres Freund
andres@anarazel.de
In reply to: Matthias van de Meent (#17)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:

On Mon, 9 Jan 2023 at 20:34, Andres Freund <andres@anarazel.de> wrote:

On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote:

Wouldn't it be enough to only fix the constructions in
FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
not occur with the patch), and (optionally) bump the first XID
available for any cluster to (FirstNormalXid + 1) to retain the 'older
than any running transaction' property?

It's not too hard to fix in individual places, but I suspect that we'll
introduce the bug in future places without some more fundamental protection.

Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value
in ComputeXidHorizons() and GetSnapshotData().

I don't think that clamping the value with oldestXid (as seen in patch
0001, in GetSnapshotData) is right.

I agree that using oldestXid to clamp is problematic.

It would clamp the value relative to the oldest frozen xid of all
databases, which can be millions of transactions behind oldestXmin,
and thus severely skew the amount of transaction's changes you keep on
disk (that is, until oldestXid moves past 1000_000).

What precisely do you mean with "skew" here? Do you just mean that it'd take a
long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
you might mean more than that?

I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
seems like a mighty invasive change to backpatch.

Greetings,

Andres Freund

#19Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#18)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:

On Mon, 9 Jan 2023 at 20:34, Andres Freund <andres@anarazel.de> wrote:

It's not too hard to fix in individual places, but I suspect that we'll
introduce the bug in future places without some more fundamental protection.

Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value
in ComputeXidHorizons() and GetSnapshotData().

I don't think that clamping the value with oldestXid (as seen in patch
0001, in GetSnapshotData) is right.

I agree that using oldestXid to clamp is problematic.

It would clamp the value relative to the oldest frozen xid of all
databases, which can be millions of transactions behind oldestXmin,
and thus severely skew the amount of transaction's changes you keep on
disk (that is, until oldestXid moves past 1000_000).

What precisely do you mean with "skew" here? Do you just mean that it'd take a
long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
you might mean more than that?

h->oldest_considered_running can be extremely old due to the global
nature of the value and the potential existence of a snapshot in
another database that started in parallel to a very old running
transaction.

Example: With vacuum_defer_cleanup_age set to 1000000, it is possible
that a snapshot in another database (thus another backend) would
result in a local intermediate status result of h->o_c_r = 20,
h->s_o_n = 20, h->d_o_n = 10030. The clamped offset would then be 20
(clamped using h->o_c_r), which updates h->data_oldest_nonremovable to
10010. The obvious result is that all but the last 20 transactions
from this database's data files are available for cleanup, which
contradicts with the intention of the vacuum_defer_cleanup_age GUC.

I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
seems like a mighty invasive change to backpatch.

I'm not sure either. Protecting against underflow by halving the
effective valid value space is quite the intervention, but if it is
necessary to make this work in a performant manner, it would be worth
it. Maybe someone else with more experience can provide their opinion
here.

Kind regards,

Matthias van de Meent

#20Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Matthias van de Meent (#19)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hello.

I have registered it as patch in the commit fest:
https://commitfest.postgresql.org/42/4138/

Best regards,
Michail.

#21Andres Freund
andres@anarazel.de
In reply to: Matthias van de Meent (#19)
4 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote:

On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote:

On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:
What precisely do you mean with "skew" here? Do you just mean that it'd take a
long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
you might mean more than that?

h->oldest_considered_running can be extremely old due to the global
nature of the value and the potential existence of a snapshot in
another database that started in parallel to a very old running
transaction.

Here's a version that, I think, does not have that issue.

In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
helper function for this, but it seems likely we'll need it in other places
too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by
pointer, as the line lengths / repetition otherwise end up making it hard to
read the code. For now I have TransactionIdRetreatSafely() be private to
procarray.c, but I expect we'll have to change that eventually.

Not sure I like TransactionIdRetreatSafely() as a name. Maybe
TransactionIdRetreatClamped() is better?

I've been working on a test for vacuum_defer_cleanup_age. It does catch the
corruption at hand, but not much more. It's quite painful to write, right
now. Some of the reasons:
/messages/by-id/20230130194350.zj5v467x4jgqt3d6@awork3.anarazel.de

I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
seems like a mighty invasive change to backpatch.

I'm not sure either. Protecting against underflow by halving the
effective valid value space is quite the intervention, but if it is
necessary to make this work in a performant manner, it would be worth
it. Maybe someone else with more experience can provide their opinion
here.

The attached assertions just removes 1/2**32'ths of the space, by reserving
the xid range with the upper 32bit set as something that shouldn't be
reachable.

Still requires us to change the input routines to reject that range, but I
think that's a worthy tradeoff. I didn't find the existing limits for the
type to be documented anywhere.

Obviously something like that could only go into HEAD.

Greetings,

Andres Freund

Attachments:

v3-0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_ag.patchtext/x-diff; charset=us-asciiDownload
From 8be634900796630a772c0131925f38f136fed599 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 10:23:10 -0800
Subject: [PATCH v3 1/6] WIP: Fix corruption due to vacuum_defer_cleanup_age
 underflowing 64bit xids

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch:
---
 src/backend/storage/ipc/procarray.c | 85 +++++++++++++++++++++++++----
 1 file changed, 73 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf96416..64d0896b23b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -367,6 +367,9 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId l
 static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
 static void MaintainLatestCompletedXid(TransactionId latestXid);
 static void MaintainLatestCompletedXidRecovery(TransactionId latestXid);
+static void TransactionIdRetreatSafely(TransactionId *xid,
+									   int retreat_by,
+									   FullTransactionId rel);
 
 static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel,
 												  TransactionId xid);
@@ -1888,17 +1891,35 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * so guc.c should limit it to no more than the xidStopLimit threshold
 		 * in varsup.c.  Also note that we intentionally don't apply
 		 * vacuum_defer_cleanup_age on standby servers.
+		 *
+		 * Need to use TransactionIdRetreatSafely() instead of open-coding the
+		 * subtraction, to prevent creating an xid before
+		 * FirstNormalTransactionId.
 		 */
-		h->oldest_considered_running =
-			TransactionIdRetreatedBy(h->oldest_considered_running,
-									 vacuum_defer_cleanup_age);
-		h->shared_oldest_nonremovable =
-			TransactionIdRetreatedBy(h->shared_oldest_nonremovable,
-									 vacuum_defer_cleanup_age);
-		h->data_oldest_nonremovable =
-			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
-									 vacuum_defer_cleanup_age);
-		/* defer doesn't apply to temp relations */
+		Assert(TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+											 h->shared_oldest_nonremovable));
+		Assert(TransactionIdPrecedesOrEquals(h->shared_oldest_nonremovable,
+											 h->data_oldest_nonremovable));
+
+		if (vacuum_defer_cleanup_age > 0)
+		{
+			TransactionIdRetreatSafely(&h->oldest_considered_running,
+									   vacuum_defer_cleanup_age,
+									   h->latest_completed);
+			TransactionIdRetreatSafely(&h->shared_oldest_nonremovable,
+									   vacuum_defer_cleanup_age,
+									   h->latest_completed);
+			TransactionIdRetreatSafely(&h->data_oldest_nonremovable,
+									   vacuum_defer_cleanup_age,
+									   h->latest_completed);
+			/* defer doesn't apply to temp relations */
+
+
+			Assert(TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+												 h->shared_oldest_nonremovable));
+			Assert(TransactionIdPrecedesOrEquals(h->shared_oldest_nonremovable,
+												 h->data_oldest_nonremovable));
+		}
 	}
 
 	/*
@@ -2470,8 +2491,10 @@ GetSnapshotData(Snapshot snapshot)
 		oldestfxid = FullXidRelativeTo(latest_completed, oldestxid);
 
 		/* apply vacuum_defer_cleanup_age */
-		def_vis_xid_data =
-			TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age);
+		def_vis_xid_data = xmin;
+		TransactionIdRetreatSafely(&def_vis_xid_data,
+								   vacuum_defer_cleanup_age,
+								   oldestfxid);
 
 		/* Check whether there's a replication slot requiring an older xmin. */
 		def_vis_xid_data =
@@ -4295,6 +4318,44 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 	return GlobalVisTestIsRemovableXid(state, xid);
 }
 
+/*
+ * Safely retract *xid by retreat_by, store the result in *xid.
+ *
+ * Need to be careful to prevent *xid from retreating below
+ * FirstNormalTransactionId during epoch 0. This is important to prevent
+ * generating xids that cannot be converted to a FullTransactionId without
+ * wrapping around.
+ *
+ * If retreat_by would lead to a too old xid, FirstNormalTransactionId is
+ * returned instead.
+ */
+static void
+TransactionIdRetreatSafely(TransactionId *xid, int retreat_by, FullTransactionId rel)
+{
+	TransactionId original_xid = *xid;
+	FullTransactionId fxid;
+	uint64		fxid_i;
+
+	Assert(TransactionIdIsNormal(original_xid));
+	Assert(retreat_by >= 0);	/* relevant GUCs are stored as ints */
+	AssertTransactionIdInAllowableRange(original_xid);
+
+	if (retreat_by == 0)
+		return;
+
+	fxid = FullXidRelativeTo(rel, original_xid);
+	fxid_i = U64FromFullTransactionId(fxid);
+
+	if ((fxid_i - FirstNormalTransactionId) <= retreat_by)
+		*xid = FirstNormalTransactionId;
+	else
+	{
+		*xid = TransactionIdRetreatedBy(original_xid, retreat_by);
+		Assert(TransactionIdIsNormal(*xid));
+		Assert(NormalTransactionIdPrecedes(*xid, original_xid));
+	}
+}
+
 /*
  * Convert a 32 bit transaction id into 64 bit transaction id, by assuming it
  * is within MaxTransactionId / 2 of XidFromFullTransactionId(rel).
-- 
2.38.0

v3-0002-WIP-very-incomplete-test-for-vacuum_defer_cleanup.patchtext/x-diff; charset=us-asciiDownload
From 356f709f3173da84c464b7aaedac5a8595619610 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 30 Jan 2023 11:50:44 -0800
Subject: [PATCH v3 2/6] WIP: very incomplete test for vacuum_defer_cleanup_age

But it's enough to notice the corruption without the bugfix
---
 src/test/recovery/meson.build            |   1 +
 src/test/recovery/t/034_defer_cleanup.pl | 173 +++++++++++++++++++++++
 2 files changed, 174 insertions(+)
 create mode 100644 src/test/recovery/t/034_defer_cleanup.pl

diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index edaaa1a3ce5..da355782b02 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -40,6 +40,7 @@ tests += {
       't/031_recovery_conflict.pl',
       't/032_relfilenode_reuse.pl',
       't/033_replay_tsp_drops.pl',
+      't/034_defer_cleanup.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/034_defer_cleanup.pl b/src/test/recovery/t/034_defer_cleanup.pl
new file mode 100644
index 00000000000..77ce6037db8
--- /dev/null
+++ b/src/test/recovery/t/034_defer_cleanup.pl
@@ -0,0 +1,173 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Check that vacuum_defer_cleanup_age works.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+# to make things more predictable, we schedule all vacuums ourselves
+$primary->append_conf('postgresql.conf', qq[
+autovacuum = 0
+log_line_prefix='%m [%p][%b][%v:%x][%a] '
+]);
+
+$primary->backup('backup');
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($primary, 'backup',
+  has_streaming => 1);
+
+# vacuum_defer_cleanup_age only makes sense without feedback
+# We don't want to ever wait for the standby to apply, otherwise the test will
+# take forever.
+$standby->append_conf('postgresql.conf', qq[
+hot_standby_feedback = 0
+max_standby_streaming_delay = 0
+]);
+$standby->start;
+
+# to avoid constantly needing to wait manually, use syncrep with remote_apply
+$primary->append_conf('postgresql.conf', qq[
+synchronous_standby_names = '*'
+synchronous_commit = remote_apply
+]);
+
+$standby->reload;
+
+
+# Find original txid at start, we want to use a cleanup age bigger than that,
+# because we had bugs with horizons wrapping around to before the big bang
+my $xid_at_start = $primary->safe_psql('postgres', 'SELECT txid_current()');
+my $defer_age = $xid_at_start + 100;
+
+note "Initial xid is $xid_at_start, will set defer to $defer_age";
+
+$primary->safe_psql('postgres', qq[
+  ALTER SYSTEM SET vacuum_defer_cleanup_age = $defer_age;
+  SELECT pg_reload_conf();
+]);
+
+$primary->safe_psql('postgres', qq[
+  CREATE TABLE testdata(id int not null unique, data text);
+  INSERT INTO testdata(id, data) VALUES(1, '1');
+  INSERT INTO testdata(id, data) VALUES(2, '2');
+  INSERT INTO testdata(id, data) VALUES(3, '3');
+  INSERT INTO testdata(id, data) VALUES(4, '4');
+  INSERT INTO testdata(id, data) VALUES(5, '5');
+]);
+
+
+# start a background psql on both nodes, to test effects on running sessions
+my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
+my %psql_primary = ('stdin' => '', 'stdout' => '');
+$psql_primary{run} =
+  $primary->background_psql('postgres', \$psql_primary{stdin},
+	\$psql_primary{stdout},
+	$psql_timeout);
+
+my %psql_standby = ('stdin' => '', 'stdout' => '');
+$psql_standby{run} =
+  $standby->background_psql('postgres', \$psql_standby{stdin},
+	\$psql_standby{stdout},
+	$psql_timeout);
+
+$psql_primary{stdin} .= '\t\set QUIET off';
+$psql_standby{stdin} .= '\t\set QUIET off';
+
+# start transaction on the standby
+my $q = qq[
+SET application_name = 'background_psql';
+BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SELECT 1;
+];
+$psql_primary{stdin} .= $q;
+$psql_standby{stdin} .= $q;
+ok( pump_until_primary(qr/\n\(1 row\)\n$/s), 'started transaction');
+ok( pump_until_standby(qr/\n\(1 row\)\n$/s), 'started transaction');
+
+# update a row on the primary, select from the table to remove old row version
+my $res = $primary->safe_psql('postgres', qq[
+  SELECT txid_current();
+  UPDATE testdata SET data = data || '-updated' WHERE id = 1;
+  SELECT txid_current();
+  VACUUM testdata;
+  SELECT count(*) FROM testdata;
+  SELECT count(*) FROM testdata;
+  SELECT txid_current();
+  SET enable_seqscan = 0;
+  EXPLAIN SELECT * FROM testdata WHERE id = 1;
+  SELECT * FROM testdata WHERE id = 1;
+]);
+
+
+# check query result is as expected on primary and on a new snapshot on the standby
+$q = "SELECT data FROM testdata WHERE id = 1";
+is($primary->safe_psql('postgres', $q), '1-updated', "query result on primary as expected");
+is($standby->safe_psql('postgres', $q), '1-updated', "query result on standby as expected");
+
+$q = "SELECT data FROM testdata WHERE id = 1;\n";
+$psql_primary{stdin} .= $q;
+$psql_standby{stdin} .= $q;
+
+$res = pump_until_primary(qr/^.*\n\(1 row\)\n$/s);
+my $expect = "data\n1\n(1 row)\n";
+is( $res, $expect, 'version 1 still visible on primary after update on primary');
+$res = pump_until_standby(qr/.*\n\(1 row\)\n$/s);
+is( $res, $expect, 'row 1, version 1 still visible on standby after update on primary');
+
+$psql_primary{stdin} .= "COMMIT;\n";
+is(pump_until_primary(qr/^COMMIT\n/m), "COMMIT\n", "release on primary");
+
+
+done_testing();
+
+sub pump_until_node
+{
+	my $psql = shift;
+	my $match = shift;
+	my $ret;
+
+	pump_until($$psql{run}, $psql_timeout,
+		\$$psql{stdout}, $match);
+	$ret = $$psql{stdout};
+	$$psql{stdout} = '';
+	return $ret;
+}
+
+sub pump_until_primary
+{
+	my $match = shift;
+
+	return pump_until_node(\%psql_primary, $match);
+}
+
+sub pump_until_standby
+{
+	my $match = shift;
+
+	return pump_until_node(\%psql_standby, $match);
+}
+
+sub burn_xids
+{
+	my $cnt = shift;
+
+	my $out = $primary->safe_psql('postgres', qq[
+SELECT txid_current();
+DO $$
+  BEGIN FOR i IN 1..100 LOOP
+    PERFORM txid_current();
+    COMMIT AND CHAIN;
+  END LOOP;
+END; $$;
+SELECT txid_current();
+]);
+	note $out;
+}
-- 
2.38.0

v3-0003-wip-amcheck-Fix-bugs-relating-to-64bit-xids.patchtext/x-diff; charset=us-asciiDownload
From a3032b89ce8e8e325035e2858a2b58a92d6ede7c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 11:24:37 -0800
Subject: [PATCH v3 3/6] wip: amcheck: Fix bugs relating to 64bit xids

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
---
 contrib/amcheck/verify_heapam.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 4fcfd6df72d..56a3b3e9974 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1576,11 +1576,25 @@ FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
 {
 	uint32		epoch;
 
+	Assert(TransactionIdIsValid(ctx->next_xid));
+	Assert(FullTransactionIdIsValid(ctx->next_fxid));
+
 	if (!TransactionIdIsNormal(xid))
 		return FullTransactionIdFromEpochAndXid(0, xid);
 	epoch = EpochFromFullTransactionId(ctx->next_fxid);
 	if (xid > ctx->next_xid)
+	{
+		/*
+		 * FIXME, doubtful this is the best fix.
+		 *
+		 * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
+		 * 0. Represent it as an xid from the future instead.
+		 */
+		if (epoch == 0)
+			return FullTransactionIdFromEpochAndXid(0, xid);
+
 		epoch--;
+	}
 	return FullTransactionIdFromEpochAndXid(epoch, xid);
 }
 
@@ -1597,8 +1611,8 @@ update_cached_xid_range(HeapCheckContext *ctx)
 	LWLockRelease(XidGenLock);
 
 	/* And compute alternate versions of the same */
-	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 	ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 }
 
 /*
-- 
2.38.0

v3-0004-wip-stricter-validation-for-64bit-xids.patchtext/x-diff; charset=us-asciiDownload
From 5f995d8f3893adf263942606f2e0511e9d21f8f4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 11:25:43 -0800
Subject: [PATCH v3 4/6] wip: stricter validation for 64bit xids

64bit xids can't represent xids before xid 0, but 32bit xids can. This has
lead to a number of bugs, some data corrupting. To make it easier to catch
such bugs, disallow 64bit where the upper 32bit are all set, which is what one
gets when naively trying to convert such 32bit xids. Additionally explicitly
disallow 64bit xids that are already aren't allowed because they'd map to
special 32bit xids.

This changes the input routines for 64bit xids to error out in more cases.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
---
 src/include/access/transam.h      | 136 ++++++++++++++++++++++++++++--
 src/backend/utils/adt/xid.c       |   7 ++
 src/backend/utils/adt/xid8funcs.c |  17 ++--
 src/test/regress/expected/xid.out |  44 ++++++++--
 src/test/regress/sql/xid.sql      |  12 ++-
 5 files changed, 192 insertions(+), 24 deletions(-)

diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d30556..0d4bf57e63a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -44,15 +44,6 @@
 #define TransactionIdStore(xid, dest)	(*(dest) = (xid))
 #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)
 
-#define EpochFromFullTransactionId(x)	((uint32) ((x).value >> 32))
-#define XidFromFullTransactionId(x)		((uint32) (x).value)
-#define U64FromFullTransactionId(x)		((x).value)
-#define FullTransactionIdEquals(a, b)	((a).value == (b).value)
-#define FullTransactionIdPrecedes(a, b)	((a).value < (b).value)
-#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value)
-#define FullTransactionIdFollows(a, b) ((a).value > (b).value)
-#define FullTransactionIdFollowsOrEquals(a, b) ((a).value >= (b).value)
-#define FullTransactionIdIsValid(x)		TransactionIdIsValid(XidFromFullTransactionId(x))
 #define InvalidFullTransactionId		FullTransactionIdFromEpochAndXid(0, InvalidTransactionId)
 #define FirstNormalFullTransactionId	FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId)
 #define FullTransactionIdIsNormal(x)	FullTransactionIdFollowsOrEquals(x, FirstNormalFullTransactionId)
@@ -61,12 +52,127 @@
  * A 64 bit value that contains an epoch and a TransactionId.  This is
  * wrapped in a struct to prevent implicit conversion to/from TransactionId.
  * Not all values represent valid normal XIDs.
+ *
+ * 64bit xids that would map to a non-normal 32bit xid are not allowed, for
+ * obvious reasons.
+
+ * In addition, no xids with all the upper 32bit sets are allowed to exist,
+ * this is to make it easier to catch conversion errors from 32bit xids (which
+ * can point to "before" xid 0).
  */
 typedef struct FullTransactionId
 {
 	uint64		value;
 } FullTransactionId;
 
+#define MAX_FULL_TRANSACTION_ID ((((uint64)~(uint32)0)<<32) - 1)
+
+/*
+ * This is separate from FullTransactionIdValidRange() so that input routines
+ * can check for invalid values without triggering an assert.
+ */
+static inline bool
+InFullTransactionIdRange(uint64 val)
+{
+	/* 64bit xid above the upper bound */
+	if (val > MAX_FULL_TRANSACTION_ID)
+		return false;
+
+	/*
+	 * normal 64bit xids that'd map to special 32 xids aren't allowed.
+	 */
+	if (val >= (uint64) FirstNormalTransactionId &&
+		(uint32) val < FirstNormalTransactionId)
+		return false;
+	return true;
+}
+
+/*
+ * Validation routine, typically used in asserts.
+ */
+static inline bool
+FullTransactionIdValidRange(FullTransactionId x)
+{
+	return InFullTransactionIdRange(x.value);
+}
+
+static inline uint64
+U64FromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return x.value;
+}
+
+static inline TransactionId
+XidFromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return (TransactionId) x.value;
+}
+
+static inline uint32
+EpochFromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return (uint32) (x.value >> 32);
+}
+
+static inline bool
+FullTransactionIdEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value == b.value;
+}
+
+static inline bool
+FullTransactionIdPrecedes(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value < b.value;
+}
+
+static inline bool
+FullTransactionIdPrecedesOrEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value <= b.value;
+}
+
+static inline bool
+FullTransactionIdFollows(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value > b.value;
+}
+
+static inline bool
+FullTransactionIdFollowsOrEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value >= b.value;
+}
+
+static inline bool
+FullTransactionIdIsValid(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return x.value != (uint64) InvalidTransactionId;
+}
+
 static inline FullTransactionId
 FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid)
 {
@@ -74,6 +180,8 @@ FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid)
 
 	result.value = ((uint64) epoch) << 32 | xid;
 
+	Assert(FullTransactionIdValidRange(result));
+
 	return result;
 }
 
@@ -84,6 +192,8 @@ FullTransactionIdFromU64(uint64 value)
 
 	result.value = value;
 
+	Assert(FullTransactionIdValidRange(result));
+
 	return result;
 }
 
@@ -102,6 +212,8 @@ FullTransactionIdFromU64(uint64 value)
 static inline void
 FullTransactionIdRetreat(FullTransactionId *dest)
 {
+	Assert(FullTransactionIdValidRange(*dest));
+
 	dest->value--;
 
 	/*
@@ -118,6 +230,8 @@ FullTransactionIdRetreat(FullTransactionId *dest)
 	 */
 	while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 		dest->value--;
+
+	Assert(FullTransactionIdValidRange(*dest));
 }
 
 /*
@@ -127,6 +241,8 @@ FullTransactionIdRetreat(FullTransactionId *dest)
 static inline void
 FullTransactionIdAdvance(FullTransactionId *dest)
 {
+	Assert(FullTransactionIdValidRange(*dest));
+
 	dest->value++;
 
 	/* see FullTransactionIdAdvance() */
@@ -135,6 +251,8 @@ FullTransactionIdAdvance(FullTransactionId *dest)
 
 	while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 		dest->value++;
+
+	Assert(FullTransactionIdValidRange(*dest));
 }
 
 /* back up a transaction ID variable, handling wraparound correctly */
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 8ac1679c381..21c0f4c67df 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -188,6 +188,13 @@ xid8in(PG_FUNCTION_ARGS)
 	uint64		result;
 
 	result = uint64in_subr(str, NULL, "xid8", fcinfo->context);
+
+	if (!InFullTransactionIdRange(result))
+		ereturn(fcinfo->context, 0,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("value \"%s\" is out of range for type %s",
+						str, "xid8")));
+
 	PG_RETURN_FULLTRANSACTIONID(FullTransactionIdFromU64(result));
 }
 
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index e616303a292..8b33d145a36 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -302,15 +302,18 @@ parse_snapshot(const char *str, Node *escontext)
 	const char *str_start = str;
 	char	   *endp;
 	StringInfo	buf;
+	uint64		raw_fxid;
 
-	xmin = FullTransactionIdFromU64(strtou64(str, &endp, 10));
-	if (*endp != ':')
+	raw_fxid = strtou64(str, &endp, 10);
+	if (*endp != ':' || !InFullTransactionIdRange(raw_fxid))
 		goto bad_format;
+	xmin = FullTransactionIdFromU64(raw_fxid);
 	str = endp + 1;
 
-	xmax = FullTransactionIdFromU64(strtou64(str, &endp, 10));
-	if (*endp != ':')
+	raw_fxid = strtou64(str, &endp, 10);
+	if (*endp != ':' || !InFullTransactionIdRange(raw_fxid))
 		goto bad_format;
+	xmax = FullTransactionIdFromU64(raw_fxid);
 	str = endp + 1;
 
 	/* it should look sane */
@@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
 	while (*str != '\0')
 	{
 		/* read next value */
-		val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
+		raw_fxid = strtou64(str, &endp, 10);
+
+		val = FullTransactionIdFromU64(raw_fxid);
+		if (!InFullTransactionIdRange(raw_fxid))
+			goto bad_format;
 		str = endp;
 
 		/* require the input to be in order */
diff --git a/src/test/regress/expected/xid.out b/src/test/regress/expected/xid.out
index e62f7019434..4348eb66bd5 100644
--- a/src/test/regress/expected/xid.out
+++ b/src/test/regress/expected/xid.out
@@ -6,11 +6,11 @@ select '010'::xid,
        '-1'::xid,
 	   '010'::xid8,
 	   '42'::xid8,
-	   '0xffffffffffffffff'::xid8,
-	   '-1'::xid8;
- xid | xid |    xid     |    xid     | xid8 | xid8 |         xid8         |         xid8         
------+-----+------------+------------+------+------+----------------------+----------------------
-   8 |  42 | 4294967295 | 4294967295 |    8 |   42 | 18446744073709551615 | 18446744073709551615
+	   '0xefffffffffffffff'::xid8,
+	   '0'::xid8;
+ xid | xid |    xid     |    xid     | xid8 | xid8 |         xid8         | xid8 
+-----+-----+------------+------------+------+------+----------------------+------
+   8 |  42 | 4294967295 | 4294967295 |    8 |   42 | 17293822569102704639 |    0
 (1 row)
 
 -- garbage values
@@ -30,6 +30,18 @@ select 'asdf'::xid8;
 ERROR:  invalid input syntax for type xid8: "asdf"
 LINE 1: select 'asdf'::xid8;
                ^
+select '-1'::xid8;
+ERROR:  value "-1" is out of range for type xid8
+LINE 1: select '-1'::xid8;
+               ^
+select '0xffffffffffffffff'::xid8;
+ERROR:  value "0xffffffffffffffff" is out of range for type xid8
+LINE 1: select '0xffffffffffffffff'::xid8;
+               ^
+select '0x0000300000000001'::xid8;
+ERROR:  value "0x0000300000000001" is out of range for type xid8
+LINE 1: select '0x0000300000000001'::xid8;
+               ^
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('42', 'xid');
  pg_input_is_valid 
@@ -67,6 +79,24 @@ SELECT pg_input_error_message('0xffffffffffffffffffff', 'xid8');
  value "0xffffffffffffffffffff" is out of range for type xid8
 (1 row)
 
+SELECT pg_input_is_valid('-1', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
+SELECT pg_input_is_valid('0xffffffffffffffff', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
+SELECT pg_input_is_valid('0x0000300000000001', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
 -- equality
 select '1'::xid = '1'::xid;
  ?column? 
@@ -160,11 +190,11 @@ select xid8cmp('1', '2'), xid8cmp('2', '2'), xid8cmp('2', '1');
 
 -- min() and max() for xid8
 create table xid8_t1 (x xid8);
-insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xffffffffffffffff'), ('-1');
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xefffffffffffffff');
 select min(x), max(x) from xid8_t1;
  min |         max          
 -----+----------------------
-   0 | 18446744073709551615
+   0 | 17293822569102704639
 (1 row)
 
 -- xid8 has btree and hash opclasses
diff --git a/src/test/regress/sql/xid.sql b/src/test/regress/sql/xid.sql
index b6996588ef6..b2104e4ce20 100644
--- a/src/test/regress/sql/xid.sql
+++ b/src/test/regress/sql/xid.sql
@@ -7,14 +7,17 @@ select '010'::xid,
        '-1'::xid,
 	   '010'::xid8,
 	   '42'::xid8,
-	   '0xffffffffffffffff'::xid8,
-	   '-1'::xid8;
+	   '0xefffffffffffffff'::xid8,
+	   '0'::xid8;
 
 -- garbage values
 select ''::xid;
 select 'asdf'::xid;
 select ''::xid8;
 select 'asdf'::xid8;
+select '-1'::xid8;
+select '0xffffffffffffffff'::xid8;
+select '0x0000300000000001'::xid8;
 
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('42', 'xid');
@@ -23,6 +26,9 @@ SELECT pg_input_error_message('0xffffffffff', 'xid');
 SELECT pg_input_is_valid('42', 'xid8');
 SELECT pg_input_is_valid('asdf', 'xid8');
 SELECT pg_input_error_message('0xffffffffffffffffffff', 'xid8');
+SELECT pg_input_is_valid('-1', 'xid8');
+SELECT pg_input_is_valid('0xffffffffffffffff', 'xid8');
+SELECT pg_input_is_valid('0x0000300000000001', 'xid8');
 
 -- equality
 select '1'::xid = '1'::xid;
@@ -51,7 +57,7 @@ select xid8cmp('1', '2'), xid8cmp('2', '2'), xid8cmp('2', '1');
 
 -- min() and max() for xid8
 create table xid8_t1 (x xid8);
-insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xffffffffffffffff'), ('-1');
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xefffffffffffffff');
 select min(x), max(x) from xid8_t1;
 
 -- xid8 has btree and hash opclasses
-- 
2.38.0

#22Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#21)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres@anarazel.de> wrote:

On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote:

On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote:

On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:
What precisely do you mean with "skew" here? Do you just mean that it'd take a
long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
you might mean more than that?

h->oldest_considered_running can be extremely old due to the global
nature of the value and the potential existence of a snapshot in
another database that started in parallel to a very old running
transaction.

Here's a version that, I think, does not have that issue.

Thanks!

In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
helper function for this, but it seems likely we'll need it in other places
too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by
pointer, as the line lengths / repetition otherwise end up making it hard to
read the code. For now I have TransactionIdRetreatSafely() be private to
procarray.c, but I expect we'll have to change that eventually.

If TransactionIdRetreatSafely will be exposed outside procarray.c,
then I think the xid pointer should be replaced with normal
arguments/returns; both for parity with TransactionIdRetreatedBy and
to remove this memory store dependency in this hot code path.

Not sure I like TransactionIdRetreatSafely() as a name. Maybe
TransactionIdRetreatClamped() is better?

I think the 'safely' version is fine.

I've been working on a test for vacuum_defer_cleanup_age. It does catch the
corruption at hand, but not much more. It's quite painful to write, right
now. Some of the reasons:
/messages/by-id/20230130194350.zj5v467x4jgqt3d6@awork3.anarazel.de

I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
seems like a mighty invasive change to backpatch.

I'm not sure either. Protecting against underflow by halving the
effective valid value space is quite the intervention, but if it is
necessary to make this work in a performant manner, it would be worth
it. Maybe someone else with more experience can provide their opinion
here.

The attached assertions just removes 1/2**32'ths of the space, by reserving
the xid range with the upper 32bit set as something that shouldn't be
reachable.

I think that is acceptible.

Still requires us to change the input routines to reject that range, but I
think that's a worthy tradeoff.

Agreed.

I didn't find the existing limits for the
type to be documented anywhere.

Obviously something like that could only go into HEAD.

Yeah.

Comments on 0003:

+        /*
+         * FIXME, doubtful this is the best fix.
+         *
+         * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
+         * 0. Represent it as an xid from the future instead.
+         */
+        if (epoch == 0)
+            return FullTransactionIdFromEpochAndXid(0, xid);

Shouldn't this be an error condition instead, as this XID should not
be able to appear?

on 0004:

-       '0xffffffffffffffff'::xid8,
-       '-1'::xid8;
+       '0xefffffffffffffff'::xid8,
+       '0'::xid8;

The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also
test on 0xffff_fffE_ffff_ffff to test for input of our actual max
value?

@@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
while (*str != '\0')
{
/* read next value */
-        val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
+        raw_fxid = strtou64(str, &endp, 10);
+
+        val = FullTransactionIdFromU64(raw_fxid);
+        if (!InFullTransactionIdRange(raw_fxid))
+            goto bad_format;

With assertions enabled FullTransactionIdFromU64 will assert the
InFullTransactionIdRange condition, meaning we wouldn't hit the branch
into bad_format.
I think these operations should be swapped, as parsing a snapshot
shouldn't run into assertion failures like this if it can error
normally. Maybe this can be added to tests as well?

Kind regards,

Matthias van de Meent

#23Andres Freund
andres@anarazel.de
In reply to: Matthias van de Meent (#22)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-31 15:05:17 +0100, Matthias van de Meent wrote:

On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres@anarazel.de> wrote:

In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
helper function for this, but it seems likely we'll need it in other places
too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by
pointer, as the line lengths / repetition otherwise end up making it hard to
read the code. For now I have TransactionIdRetreatSafely() be private to
procarray.c, but I expect we'll have to change that eventually.

If TransactionIdRetreatSafely will be exposed outside procarray.c,
then I think the xid pointer should be replaced with normal
arguments/returns; both for parity with TransactionIdRetreatedBy

That's why I named one version *Retreat the other Retreated :)

I think it'll make the code easier to read in the other places too, the
variable names / function names in this space are uncomfortably long to
fit into 78chars..., particularly when there's two references to the
same variable in the same line.

and to remove this memory store dependency in this hot code path.

I doubt that matters much here and the places it's going to be used
in. And presumably the compiler will inline it anyway. I'd probably make
it a static inline in the header too.

What's making me hesitate about exposing it is that it's quite easy to
get things wrong by using a wrong fxid or such.

+        /*
+         * FIXME, doubtful this is the best fix.
+         *
+         * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
+         * 0. Represent it as an xid from the future instead.
+         */
+        if (epoch == 0)
+            return FullTransactionIdFromEpochAndXid(0, xid);

Shouldn't this be an error condition instead, as this XID should not
be able to appear?

If you mean error in the sense of ERROR, no, I don't think so. That code
tries hard to be able to check many tuples in a row. And if we were to
error out here, we'd not able to do that. We should still report those
tuples as corrupt, fwiw.

The reason this path is hit is that a test intentionally corrupts some
xids. So the path is reachable and we need to cope somehow.

I'm not really satisfied with this fix either - I mostly wanted to
include something sufficient to prevent assertion failures.

I had hoped that Mark would look at the amcheck bits and come up with
more complete fixes.

on 0004:

-       '0xffffffffffffffff'::xid8,
-       '-1'::xid8;
+       '0xefffffffffffffff'::xid8,
+       '0'::xid8;

The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also
test on 0xffff_fffE_ffff_ffff to test for input of our actual max
value?

Probably a good idea.

@@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
while (*str != '\0')
{
/* read next value */
-        val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
+        raw_fxid = strtou64(str, &endp, 10);
+
+        val = FullTransactionIdFromU64(raw_fxid);
+        if (!InFullTransactionIdRange(raw_fxid))
+            goto bad_format;

With assertions enabled FullTransactionIdFromU64 will assert the
InFullTransactionIdRange condition, meaning we wouldn't hit the branch
into bad_format.
I think these operations should be swapped, as parsing a snapshot
shouldn't run into assertion failures like this if it can error
normally.

Yep.

Maybe this can be added to tests as well?

I'll check. I thought for a bit it'd not work because we'd perform range
checks on the xids, but we don't...

Greetings,

Andres Freund

#24Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#23)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Tue, 31 Jan 2023 at 23:48, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-01-31 15:05:17 +0100, Matthias van de Meent wrote:

If TransactionIdRetreatSafely will be exposed outside procarray.c,
then I think the xid pointer should be replaced with normal
arguments/returns; both for parity with TransactionIdRetreatedBy

That's why I named one version *Retreat the other Retreated :)

That part I noticed too :) I don't mind either way, I was just
concerned with exposing the function as a prototype, not as an inline
static.

I think it'll make the code easier to read in the other places too, the
variable names / function names in this space are uncomfortably long to
fit into 78chars..., particularly when there's two references to the
same variable in the same line.

I guess that's true, and once inlined there should indeed be no extra
runtime overhead.

78 chars

Didn't we use 80 columns/chars? How did you get to 78? Not that I
can't think of any ways, but none of them stand out to me as obviously
correct.

and to remove this memory store dependency in this hot code path.

I doubt that matters much here and the places it's going to be used
in.

I thought that this was executed while still in ProcArrayLock, but
instead we've released that lock already by the time we're trying to
adjust the horizons, so the 'hot code path' concern is mostly
relieved.

And presumably the compiler will inline it anyway. I'd probably make
it a static inline in the header too.

Yes, my concern was based on an extern prototype with private
implementation, as that does prohibit inlining and thus would have a
requirement to push the data to memory (probably only L1, but still
memory).

What's making me hesitate about exposing it is that it's quite easy to
get things wrong by using a wrong fxid or such.

I'm less concerned about that when the function is well-documented.

+        /*
+         * FIXME, doubtful this is the best fix.
+         *
+         * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
+         * 0. Represent it as an xid from the future instead.
+         */
+        if (epoch == 0)
+            return FullTransactionIdFromEpochAndXid(0, xid);

Shouldn't this be an error condition instead, as this XID should not
be able to appear?

If you mean error in the sense of ERROR, no, I don't think so. That code
tries hard to be able to check many tuples in a row. And if we were to
error out here, we'd not able to do that. We should still report those
tuples as corrupt, fwiw.

The reason this path is hit is that a test intentionally corrupts some
xids. So the path is reachable and we need to cope somehow.

I see.

Kind regards,

Matthias van de Meent

#25Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#10)
1 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

Heikki, Andrey, CCing you because you wrote

commit 6655a7299d835dea9e8e0ba69cc5284611b96f29
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: 2019-07-24 20:24:07 +0300

Use full 64-bit XID for checking if a deleted GiST page is old enough.

On 2023-01-07 19:09:56 -0800, Andres Freund wrote:

I haven't found other problematic places in HEAD, but did end up find a less
serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify
that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns
values that look likely to cause problems. Its "just" used in gist luckily.

Is there a good way to make breakage in the page recycling mechanism
visible with gist? I guess to see corruption, I'd have to halt a scan
before a page is visited with gdb, then cause the page to be recycled
prematurely in another session, then unblock the first? Which'd then
visit that page, thinking it to be in a different part of the tree than
it actually is?

I'm pretty sure it's broken though.

On 13, with vacuum_defer_cleanup_age=0, the attached script has two
consecutive VACUUM VERBOSEs output

106 index pages have been deleted, 0 are currently reusable.
106 index pages have been deleted, 0 are currently reusable.

in the presence of a prepared transaction. Which makes sense.

But with vacuum_defer_cleanup_age=10000

106 index pages have been deleted, 0 are currently reusable.
106 index pages have been deleted, 106 are currently reusable.

which clearly doesn't seem right.

I just can't quite judge how bad that is.

Greetings,

Andres Freund

Attachments:

gist-13-defer.sqlapplication/sqlDownload
#26Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Andres Freund (#25)
1 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hello, Andres.

Not sure I like TransactionIdRetreatSafely() as a name. Maybe
TransactionIdRetreatClamped() is better?

I think it is better to just replace TransactionIdRetreatedBy.
It is not used anymore after
`v3-0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_ag.patch` -
so, it is better to replace the dangerous version in order to avoid
similar issues in the future.
But we need also to move FullXidRelativeTo in that case (not sure it is safe).

I think it'll make the code easier to read in the other places too, the
variable names / function names in this space are uncomfortably long to
fit into 78chars..., particularly when there's two references to the
same variable in the same line.

Looks fine for my taste, but it is pretty far from perfect :)

Best regards,
Michail.

Attachments:

0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_age-u.patchtext/plain; charset=US-ASCII; name=0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_age-u.patchDownload
From ffa80fece3384e3a12a52de18102a0d169f52841 Mon Sep 17 00:00:00 2001
From: Nkey <michail.nikolaev@gmail.com>
Date: Sat, 4 Feb 2023 15:16:52 +0300
Subject: [PATCH] WIP: Fix corruption due to vacuum_defer_cleanup_age
 underflowing 64bit xids

---
 src/backend/storage/ipc/procarray.c | 56 ++++++++++++-----------------
 src/include/access/transam.h        | 71 +++++++++++++++++++++++++++++++++----
 2 files changed, 87 insertions(+), 40 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf9641..5dd662cd75 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -368,8 +368,6 @@ static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
 static void MaintainLatestCompletedXid(TransactionId latestXid);
 static void MaintainLatestCompletedXidRecovery(TransactionId latestXid);
 
-static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel,
-												  TransactionId xid);
 static void GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons);
 
 /*
@@ -1889,16 +1887,32 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * in varsup.c.  Also note that we intentionally don't apply
 		 * vacuum_defer_cleanup_age on standby servers.
 		 */
+		Assert(TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+											 h->shared_oldest_nonremovable));
+		Assert(TransactionIdPrecedesOrEquals(h->shared_oldest_nonremovable,
+											 h->data_oldest_nonremovable));
+
 		h->oldest_considered_running =
 			TransactionIdRetreatedBy(h->oldest_considered_running,
-									 vacuum_defer_cleanup_age);
-		h->shared_oldest_nonremovable =
+									 vacuum_defer_cleanup_age,
+									 h->latest_completed);
+
+		h->shared_oldest_nonremovable = 
 			TransactionIdRetreatedBy(h->shared_oldest_nonremovable,
-									 vacuum_defer_cleanup_age);
-		h->data_oldest_nonremovable =
+									 vacuum_defer_cleanup_age,
+									 h->latest_completed);
+
+		h->data_oldest_nonremovable = 
 			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
-									 vacuum_defer_cleanup_age);
+									 vacuum_defer_cleanup_age,
+									h->latest_completed);
+
 		/* defer doesn't apply to temp relations */
+
+		Assert(TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+											 h->shared_oldest_nonremovable));
+		Assert(TransactionIdPrecedesOrEquals(h->shared_oldest_nonremovable,
+											 h->data_oldest_nonremovable));
 	}
 
 	/*
@@ -2471,7 +2485,7 @@ GetSnapshotData(Snapshot snapshot)
 
 		/* apply vacuum_defer_cleanup_age */
 		def_vis_xid_data =
-			TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age);
+			TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age, oldestfxid);
 
 		/* Check whether there's a replication slot requiring an older xmin. */
 		def_vis_xid_data =
@@ -4295,32 +4309,6 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 	return GlobalVisTestIsRemovableXid(state, xid);
 }
 
-/*
- * Convert a 32 bit transaction id into 64 bit transaction id, by assuming it
- * is within MaxTransactionId / 2 of XidFromFullTransactionId(rel).
- *
- * Be very careful about when to use this function. It can only safely be used
- * when there is a guarantee that xid is within MaxTransactionId / 2 xids of
- * rel. That e.g. can be guaranteed if the caller assures a snapshot is
- * held by the backend and xid is from a table (where vacuum/freezing ensures
- * the xid has to be within that range), or if xid is from the procarray and
- * prevents xid wraparound that way.
- */
-static inline FullTransactionId
-FullXidRelativeTo(FullTransactionId rel, TransactionId xid)
-{
-	TransactionId rel_xid = XidFromFullTransactionId(rel);
-
-	Assert(TransactionIdIsValid(xid));
-	Assert(TransactionIdIsValid(rel_xid));
-
-	/* not guaranteed to find issues, but likely to catch mistakes */
-	AssertTransactionIdInAllowableRange(xid);
-
-	return FullTransactionIdFromU64(U64FromFullTransactionId(rel)
-									+ (int32) (xid - rel_xid));
-}
-
 
 /* ----------------------------------------------
  *		KnownAssignedTransactionIds sub-module
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d3055..e094e80859 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -317,16 +317,75 @@ ReadNextTransactionId(void)
 	return XidFromFullTransactionId(ReadNextFullTransactionId());
 }
 
-/* return transaction ID backed up by amount, handling wraparound correctly */
+/*
+ * Convert a 32 bit transaction id into 64 bit transaction id, by assuming it
+ * is within MaxTransactionId / 2 of XidFromFullTransactionId(rel).
+ *
+ * Be very careful about when to use this function. It can only safely be used
+ * when there is a guarantee that xid is within MaxTransactionId / 2 xids of
+ * rel. That e.g. can be guaranteed if the caller assures a snapshot is
+ * held by the backend and xid is from a table (where vacuum/freezing ensures
+ * the xid has to be within that range), or if xid is from the procarray and
+ * prevents xid wraparound that way.
+ */
+static inline FullTransactionId
+FullXidRelativeTo(FullTransactionId rel, TransactionId xid)
+{
+	TransactionId rel_xid = XidFromFullTransactionId(rel);
+
+	Assert(TransactionIdIsValid(xid));
+	Assert(TransactionIdIsValid(rel_xid));
+
+	/* not guaranteed to find issues, but likely to catch mistakes */
+	AssertTransactionIdInAllowableRange(xid);
+
+	return FullTransactionIdFromU64(U64FromFullTransactionId(rel)
+		+ (int32)(xid - rel_xid));
+}
+
+/*
+ * return transaction ID backed up by amount, handling wraparound correctly
+ *
+ * Need to be careful to prevent xid from retreating below
+ * FirstNormalTransactionId during epoch 0. This is important to prevent
+ * generating xids that cannot be converted to a FullTransactionId without
+ * wrapping around.
+ *
+ * If amount would lead to a too old xid, FirstNormalTransactionId is
+ * returned instead.
+ */
+
 static inline TransactionId
-TransactionIdRetreatedBy(TransactionId xid, uint32 amount)
+TransactionIdRetreatedBy(TransactionId xid, int amount, FullTransactionId rel)
 {
-	xid -= amount;
+	FullTransactionId fxid;
+	uint64		fxid_i;
+	TransactionId r;
+
+	Assert(TransactionIdIsNormal(xid));
+	Assert(amount >= 0);	/* relevant GUCs are stored as ints */
+	AssertTransactionIdInAllowableRange(xid);
+
+	if (amount == 0)
+		return xid;
+
+	fxid = FullXidRelativeTo(rel, xid);
+	fxid_i = U64FromFullTransactionId(fxid);
+
+	if ((fxid_i - FirstNormalTransactionId) <= amount)
+		r = FirstNormalTransactionId;
+	else
+	{
+		r = xid - amount;
+
+		while (r < FirstNormalTransactionId)
+			r--;
 
-	while (xid < FirstNormalTransactionId)
-		xid--;
+		Assert(TransactionIdIsNormal(r));
+		Assert(NormalTransactionIdPrecedes(r, xid));
+	}
 
-	return xid;
+	return r;
 }
 
 /* return the older of the two IDs */
-- 
2.16.2.windows.1

#27Andrey Borodin
amborodin86@gmail.com
In reply to: Andres Freund (#25)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Sat, Feb 4, 2023 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:

Is there a good way to make breakage in the page recycling mechanism
visible with gist? I guess to see corruption, I'd have to halt a scan
before a page is visited with gdb, then cause the page to be recycled
prematurely in another session, then unblock the first? Which'd then
visit that page, thinking it to be in a different part of the tree than
it actually is?

In most cases landing on one extra page will not affect the scan.
Worst case that I can imagine - scan is landing on a page that is the
new parent of the deleted page. Even then we cannot end up with
infinite index scan - we will just make one extra loop. Although,
IndexScan will yield duplicate tids.

In case of interference with concurrent insertion we will get a tree
structure departed from optimal, but that is not a problem.

Best regards, Andrey Borodin.

In reply to: Andres Freund (#25)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Sat, Feb 4, 2023 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:

Is there a good way to make breakage in the page recycling mechanism
visible with gist? I guess to see corruption, I'd have to halt a scan
before a page is visited with gdb, then cause the page to be recycled
prematurely in another session, then unblock the first? Which'd then
visit that page, thinking it to be in a different part of the tree than
it actually is?

Yes. This bug is similar to an ancient nbtree bug fixed back in 2012,
by commit d3abbbeb.

which clearly doesn't seem right.

I just can't quite judge how bad that is.

It's really hard to judge, even if you're an expert. We're talking
about a fairly chaotic scenario. My guess is that there is a very
small chance of a very unpleasant scenario if you have a GiST index
that has regular page deletions, and if you use
vacuum_defer_cleanup_age. It's likely that most GiST indexes never
have any page deletions due to the workload characteristics.

--
Peter Geoghegan

#29Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#28)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-02-04 11:10:55 -0800, Peter Geoghegan wrote:

On Sat, Feb 4, 2023 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:

Is there a good way to make breakage in the page recycling mechanism
visible with gist? I guess to see corruption, I'd have to halt a scan
before a page is visited with gdb, then cause the page to be recycled
prematurely in another session, then unblock the first? Which'd then
visit that page, thinking it to be in a different part of the tree than
it actually is?

Yes. This bug is similar to an ancient nbtree bug fixed back in 2012,
by commit d3abbbeb.

which clearly doesn't seem right.

I just can't quite judge how bad that is.

It's really hard to judge, even if you're an expert. We're talking
about a fairly chaotic scenario. My guess is that there is a very
small chance of a very unpleasant scenario if you have a GiST index
that has regular page deletions, and if you use
vacuum_defer_cleanup_age. It's likely that most GiST indexes never
have any page deletions due to the workload characteristics.

Thanks.

Sounds like a problem here is too hard to repro. I mostly wanted to know how
to be more confident about a fix working correctly. There's no tests for the
whole page recycling behaviour, afaics, so it's a bit scary to change things
around.

I didn't quite feel confident pushing a fix for this just before a minor
release, so I'll push once the minor releases are tagged. A quite minimal fix
to GetFullRecentGlobalXmin() in 12-13 (returning FirstNormalTransactionId if
epoch == 0 and RecentGlobalXmin > nextxid_xid), and the slightly larger fix in
14+.

Greetings,

Andres Freund

#30Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#29)
3 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-02-06 13:02:05 -0800, Andres Freund wrote:

I didn't quite feel confident pushing a fix for this just before a minor
release, so I'll push once the minor releases are tagged. A quite minimal fix
to GetFullRecentGlobalXmin() in 12-13 (returning FirstNormalTransactionId if
epoch == 0 and RecentGlobalXmin > nextxid_xid), and the slightly larger fix in
14+.

Pushed that.

Mark:

I worked some more on the fixes for amcheck, and fixes for amcheck.

The second amcheck fix ends up correcting some inaccurate output in the tests
- previously xids from before xid 0 were reported to be in the future.

Previously there was no test case exercising exceeding nextxid, without
wrapping around into the past. I added that at the end of
004_verify_heapam.pl, because renumbering seemed too annoying.

What do you think?

Somewhat random note:

Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's
effectively O(ROWCOUNT^2), albeit with small enough constants to not really
matter. I don't think we need to insert the rows one-by-one either. Changing
that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't
change that, but we also fire off a psql for each tuple for heap_page_items(),
with offset $N no less. That seems to be another 500ms.

Greetings,

Andres Freund

Attachments:

v5-0001-amcheck-Fix-ordering-bug-in-update_cached_xid_ran.patchtext/x-diff; charset=us-asciiDownload
From 3f67523bac084964c0e780fddd509f3464d32282 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 8 Mar 2023 11:37:05 -0800
Subject: [PATCH v5 1/8] amcheck: Fix ordering bug in update_cached_xid_range()

The initialization order in update_cached_xid_range() was wrong, calling
FullTransactionIdFromXidAndCtx() before setting
->next_xid. FullTransactionIdFromXidAndCtx() uses ->next_xid.

In most situations this will not cause visible issues, because the next call
to update_cached_xid_range() will use a less wrong ->next_xid. It's rare that
xids advance fast enough for this to be a problem.

Found while adding more asserts to the 64bit xid infrastructure.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
---
 contrib/amcheck/verify_heapam.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 4fcfd6df72d..33c5b338959 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1576,6 +1576,9 @@ FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
 {
 	uint32		epoch;
 
+	Assert(TransactionIdIsNormal(ctx->next_xid));
+	Assert(FullTransactionIdIsNormal(ctx->next_fxid));
+
 	if (!TransactionIdIsNormal(xid))
 		return FullTransactionIdFromEpochAndXid(0, xid);
 	epoch = EpochFromFullTransactionId(ctx->next_fxid);
@@ -1597,8 +1600,8 @@ update_cached_xid_range(HeapCheckContext *ctx)
 	LWLockRelease(XidGenLock);
 
 	/* And compute alternate versions of the same */
-	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 	ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 }
 
 /*
-- 
2.38.0

v5-0002-amcheck-Fix-FullTransactionIdFromXidAndCtx-for-xi.patchtext/x-diff; charset=us-asciiDownload
From c376f9c698c2c9cf9ad5316ceb96160611225430 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 8 Mar 2023 11:57:34 -0800
Subject: [PATCH v5 2/8] amcheck: Fix FullTransactionIdFromXidAndCtx() for xids
 before epoch 0

64bit xids can't represent xids before epoch 0 (see also be504a3e974). When
FullTransactionIdFromXidAndCtx() was passed such an xid, it'd create a 64bit
xid far into the future. Noticed while adding assertions in the course of
investigating be504a3e974, as amcheck's test create such xids.

To fix the issue, just return FirstNormalFullTransactionId in this case. A
freshly initdb'd cluster already has a newer horizon. The most minimal version
of this would make the messages for some detected corruptions differently
inaccurate. To make those cases accurate, switch
FullTransactionIdFromXidAndCtx() to use the 32bit modulo difference between
xid and nextxid to compute the 64bit xid, yielding sensible "in the future" /
"in the past" answers.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
---
 src/bin/pg_amcheck/t/004_verify_heapam.pl | 28 +++++++++++++------
 contrib/amcheck/verify_heapam.c           | 33 +++++++++++++++++++----
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index 215c30eaa8e..9984d0d9f87 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -217,7 +217,7 @@ my $rel = $node->safe_psql('postgres',
 my $relpath = "$pgdata/$rel";
 
 # Insert data and freeze public.test
-use constant ROWCOUNT => 16;
+use constant ROWCOUNT => 17;
 $node->safe_psql(
 	'postgres', qq(
 	INSERT INTO public.test (a, b, c)
@@ -378,23 +378,24 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
 	elsif ($offnum == 3)
 	{
 		# Corruptly set xmin < datfrozenxid, further back, noting circularity
-		# of xid comparison.  For a new cluster with epoch = 0, the corrupt
-		# xmin will be interpreted as in the future
-		$tup->{t_xmin} = 4026531839;
+		# of xid comparison.
+		my $xmin = 4026531839;
+		$tup->{t_xmin} = $xmin;
 		$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
 		$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
 
 		push @expected,
-		  qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+		  qr/${$header}xmin ${xmin} precedes oldest valid transaction ID 0:\d+/;
 	}
 	elsif ($offnum == 4)
 	{
 		# Corruptly set xmax < relminmxid;
-		$tup->{t_xmax} = 4026531839;
+		my $xmax = 4026531839;
+		$tup->{t_xmax} = $xmax;
 		$tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
 
 		push @expected,
-		  qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+		  qr/${$header}xmax ${xmax} precedes oldest valid transaction ID 0:\d+/;
 	}
 	elsif ($offnum == 5)
 	{
@@ -502,7 +503,7 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
 		push @expected,
 		  qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
 	}
-	elsif ($offnum == 15)    # Last offnum must equal ROWCOUNT
+	elsif ($offnum == 15)
 	{
 		# Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
 		$tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
@@ -512,6 +513,17 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
 		push @expected,
 		  qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
 	}
+	elsif ($offnum == 16)    # Last offnum must equal ROWCOUNT
+	{
+		# Corruptly set xmin > next_xid to be in the future.
+		my $xmin = 123456;
+		$tup->{t_xmin} = $xmin;
+		$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+		$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+		push @expected,
+          qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
+	}
 	write_tuple($file, $offset, $tup);
 }
 close($file)
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 33c5b338959..94ddccd23a8 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1574,17 +1574,40 @@ check_tuple(HeapCheckContext *ctx)
 static FullTransactionId
 FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
 {
-	uint32		epoch;
+	uint64		nextfxid_i;
+	int32		diff;
+	FullTransactionId fxid;
 
 	Assert(TransactionIdIsNormal(ctx->next_xid));
 	Assert(FullTransactionIdIsNormal(ctx->next_fxid));
+	Assert(XidFromFullTransactionId(ctx->next_fxid) == ctx->next_xid);
 
 	if (!TransactionIdIsNormal(xid))
 		return FullTransactionIdFromEpochAndXid(0, xid);
-	epoch = EpochFromFullTransactionId(ctx->next_fxid);
-	if (xid > ctx->next_xid)
-		epoch--;
-	return FullTransactionIdFromEpochAndXid(epoch, xid);
+
+	nextfxid_i = U64FromFullTransactionId(ctx->next_fxid);
+
+	/* compute the 32bit modulo difference */
+	diff = (int32) (ctx->next_xid - xid);
+
+	/*
+	 * In cases of corruption we might see a 32bit xid that is before epoch
+	 * 0. We can't represent that as a 64bit xid, due to 64bit xids being
+	 * unsigned integers, without the modulo arithmetic of 32bit xid. There's
+	 * no really nice way to deal with that, but it works ok enough to use
+	 * FirstNormalFullTransactionId in that case, as a freshly initdb'd
+	 * cluster already has a newer horizon.
+	 */
+	if (diff > 0 && (nextfxid_i - FirstNormalTransactionId) < (int64) diff)
+	{
+		Assert(EpochFromFullTransactionId(ctx->next_fxid) == 0);
+		fxid = FirstNormalFullTransactionId;
+	}
+	else
+		fxid = FullTransactionIdFromU64(nextfxid_i - diff);
+
+	Assert(FullTransactionIdIsNormal(fxid));
+	return fxid;
 }
 
 /*
-- 
2.38.0

v5-0003-wip-stricter-validation-for-64bit-xids.patchtext/x-diff; charset=us-asciiDownload
From 327d170e25a90e7ea38bad37b32d207972edcc7b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 27 Feb 2023 17:09:21 -0800
Subject: [PATCH v5 3/8] wip: stricter validation for 64bit xids

64bit xids can't represent xids before xid 0, but 32bit xids can. This has
lead to a number of bugs, some data corrupting. To make it easier to catch
such bugs, disallow 64bit where the upper 32bit are all set, which is what one
gets when naively trying to convert such 32bit xids. Additionally explicitly
disallow 64bit xids that are already aren't allowed because they'd map to
special 32bit xids.

This changes the input routines for 64bit xids to error out in more cases.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
---
 src/include/access/transam.h      | 136 ++++++++++++++++++++++++++++--
 src/backend/utils/adt/xid.c       |   7 ++
 src/backend/utils/adt/xid8funcs.c |  17 ++--
 src/test/regress/expected/xid.out |  44 ++++++++--
 src/test/regress/sql/xid.sql      |  12 ++-
 5 files changed, 192 insertions(+), 24 deletions(-)

diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d30556..0cf112e5bf2 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -44,15 +44,6 @@
 #define TransactionIdStore(xid, dest)	(*(dest) = (xid))
 #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)
 
-#define EpochFromFullTransactionId(x)	((uint32) ((x).value >> 32))
-#define XidFromFullTransactionId(x)		((uint32) (x).value)
-#define U64FromFullTransactionId(x)		((x).value)
-#define FullTransactionIdEquals(a, b)	((a).value == (b).value)
-#define FullTransactionIdPrecedes(a, b)	((a).value < (b).value)
-#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value)
-#define FullTransactionIdFollows(a, b) ((a).value > (b).value)
-#define FullTransactionIdFollowsOrEquals(a, b) ((a).value >= (b).value)
-#define FullTransactionIdIsValid(x)		TransactionIdIsValid(XidFromFullTransactionId(x))
 #define InvalidFullTransactionId		FullTransactionIdFromEpochAndXid(0, InvalidTransactionId)
 #define FirstNormalFullTransactionId	FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId)
 #define FullTransactionIdIsNormal(x)	FullTransactionIdFollowsOrEquals(x, FirstNormalFullTransactionId)
@@ -61,12 +52,127 @@
  * A 64 bit value that contains an epoch and a TransactionId.  This is
  * wrapped in a struct to prevent implicit conversion to/from TransactionId.
  * Not all values represent valid normal XIDs.
+ *
+ * 64bit xids that would map to a non-normal 32bit xid are not allowed, for
+ * obvious reasons.
+
+ * In addition, no xids with all the upper 32bit sets are allowed to exist,
+ * this is to make it easier to catch conversion errors from 32bit xids (which
+ * can point to "before" xid 0).
  */
 typedef struct FullTransactionId
 {
 	uint64		value;
 } FullTransactionId;
 
+#define MAX_FULL_TRANSACTION_ID ((((uint64) PG_UINT32_MAX) << 32) - 1)
+
+/*
+ * This is separate from FullTransactionIdValidRange() so that input routines
+ * can check for invalid values without triggering an assert.
+ */
+static inline bool
+InFullTransactionIdRange(uint64 val)
+{
+	/* 64bit xid above the upper bound */
+	if (val > MAX_FULL_TRANSACTION_ID)
+		return false;
+
+	/*
+	 * normal 64bit xids that'd map to special 32 xids aren't allowed.
+	 */
+	if (val >= (uint64) FirstNormalTransactionId &&
+		(uint32) val < FirstNormalTransactionId)
+		return false;
+	return true;
+}
+
+/*
+ * Validation routine, typically used in asserts.
+ */
+static inline bool
+FullTransactionIdValidRange(FullTransactionId x)
+{
+	return InFullTransactionIdRange(x.value);
+}
+
+static inline uint64
+U64FromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return x.value;
+}
+
+static inline TransactionId
+XidFromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return (TransactionId) x.value;
+}
+
+static inline uint32
+EpochFromFullTransactionId(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return (uint32) (x.value >> 32);
+}
+
+static inline bool
+FullTransactionIdEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value == b.value;
+}
+
+static inline bool
+FullTransactionIdPrecedes(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value < b.value;
+}
+
+static inline bool
+FullTransactionIdPrecedesOrEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value <= b.value;
+}
+
+static inline bool
+FullTransactionIdFollows(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value > b.value;
+}
+
+static inline bool
+FullTransactionIdFollowsOrEquals(FullTransactionId a, FullTransactionId b)
+{
+	Assert(FullTransactionIdValidRange(a));
+	Assert(FullTransactionIdValidRange(b));
+
+	return a.value >= b.value;
+}
+
+static inline bool
+FullTransactionIdIsValid(FullTransactionId x)
+{
+	Assert(FullTransactionIdValidRange(x));
+
+	return x.value != (uint64) InvalidTransactionId;
+}
+
 static inline FullTransactionId
 FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid)
 {
@@ -74,6 +180,8 @@ FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid)
 
 	result.value = ((uint64) epoch) << 32 | xid;
 
+	Assert(FullTransactionIdValidRange(result));
+
 	return result;
 }
 
@@ -84,6 +192,8 @@ FullTransactionIdFromU64(uint64 value)
 
 	result.value = value;
 
+	Assert(FullTransactionIdValidRange(result));
+
 	return result;
 }
 
@@ -102,6 +212,8 @@ FullTransactionIdFromU64(uint64 value)
 static inline void
 FullTransactionIdRetreat(FullTransactionId *dest)
 {
+	Assert(FullTransactionIdValidRange(*dest));
+
 	dest->value--;
 
 	/*
@@ -118,6 +230,8 @@ FullTransactionIdRetreat(FullTransactionId *dest)
 	 */
 	while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 		dest->value--;
+
+	Assert(FullTransactionIdValidRange(*dest));
 }
 
 /*
@@ -127,6 +241,8 @@ FullTransactionIdRetreat(FullTransactionId *dest)
 static inline void
 FullTransactionIdAdvance(FullTransactionId *dest)
 {
+	Assert(FullTransactionIdValidRange(*dest));
+
 	dest->value++;
 
 	/* see FullTransactionIdAdvance() */
@@ -135,6 +251,8 @@ FullTransactionIdAdvance(FullTransactionId *dest)
 
 	while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 		dest->value++;
+
+	Assert(FullTransactionIdValidRange(*dest));
 }
 
 /* back up a transaction ID variable, handling wraparound correctly */
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 8ac1679c381..21c0f4c67df 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -188,6 +188,13 @@ xid8in(PG_FUNCTION_ARGS)
 	uint64		result;
 
 	result = uint64in_subr(str, NULL, "xid8", fcinfo->context);
+
+	if (!InFullTransactionIdRange(result))
+		ereturn(fcinfo->context, 0,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("value \"%s\" is out of range for type %s",
+						str, "xid8")));
+
 	PG_RETURN_FULLTRANSACTIONID(FullTransactionIdFromU64(result));
 }
 
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index e616303a292..8b33d145a36 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -302,15 +302,18 @@ parse_snapshot(const char *str, Node *escontext)
 	const char *str_start = str;
 	char	   *endp;
 	StringInfo	buf;
+	uint64		raw_fxid;
 
-	xmin = FullTransactionIdFromU64(strtou64(str, &endp, 10));
-	if (*endp != ':')
+	raw_fxid = strtou64(str, &endp, 10);
+	if (*endp != ':' || !InFullTransactionIdRange(raw_fxid))
 		goto bad_format;
+	xmin = FullTransactionIdFromU64(raw_fxid);
 	str = endp + 1;
 
-	xmax = FullTransactionIdFromU64(strtou64(str, &endp, 10));
-	if (*endp != ':')
+	raw_fxid = strtou64(str, &endp, 10);
+	if (*endp != ':' || !InFullTransactionIdRange(raw_fxid))
 		goto bad_format;
+	xmax = FullTransactionIdFromU64(raw_fxid);
 	str = endp + 1;
 
 	/* it should look sane */
@@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
 	while (*str != '\0')
 	{
 		/* read next value */
-		val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
+		raw_fxid = strtou64(str, &endp, 10);
+
+		val = FullTransactionIdFromU64(raw_fxid);
+		if (!InFullTransactionIdRange(raw_fxid))
+			goto bad_format;
 		str = endp;
 
 		/* require the input to be in order */
diff --git a/src/test/regress/expected/xid.out b/src/test/regress/expected/xid.out
index 835077e9d57..9353a9f1c2d 100644
--- a/src/test/regress/expected/xid.out
+++ b/src/test/regress/expected/xid.out
@@ -6,11 +6,11 @@ select '010'::xid,
        '-1'::xid,
 	   '010'::xid8,
 	   '42'::xid8,
-	   '0xffffffffffffffff'::xid8,
-	   '-1'::xid8;
- xid | xid |    xid     |    xid     | xid8 | xid8 |         xid8         |         xid8         
------+-----+------------+------------+------+------+----------------------+----------------------
-   8 |  42 | 4294967295 | 4294967295 |    8 |   42 | 18446744073709551615 | 18446744073709551615
+	   '0xfffffffeffffffff'::xid8,
+	   '0'::xid8;
+ xid | xid |    xid     |    xid     | xid8 | xid8 |         xid8         | xid8 
+-----+-----+------------+------------+------+------+----------------------+------
+   8 |  42 | 4294967295 | 4294967295 |    8 |   42 | 18446744069414584319 |    0
 (1 row)
 
 -- garbage values
@@ -30,6 +30,18 @@ select 'asdf'::xid8;
 ERROR:  invalid input syntax for type xid8: "asdf"
 LINE 1: select 'asdf'::xid8;
                ^
+select '-1'::xid8;
+ERROR:  value "-1" is out of range for type xid8
+LINE 1: select '-1'::xid8;
+               ^
+select '0xffffffffffffffff'::xid8;
+ERROR:  value "0xffffffffffffffff" is out of range for type xid8
+LINE 1: select '0xffffffffffffffff'::xid8;
+               ^
+select '0x0000300000000001'::xid8;
+ERROR:  value "0x0000300000000001" is out of range for type xid8
+LINE 1: select '0x0000300000000001'::xid8;
+               ^
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('42', 'xid');
  pg_input_is_valid 
@@ -67,6 +79,24 @@ SELECT * FROM pg_input_error_info('0xffffffffffffffffffff', 'xid8');
  value "0xffffffffffffffffffff" is out of range for type xid8 |        |      | 22003
 (1 row)
 
+SELECT pg_input_is_valid('-1', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
+SELECT pg_input_is_valid('0xffffffffffffffff', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
+SELECT pg_input_is_valid('0x0000300000000001', 'xid8');
+ pg_input_is_valid 
+-------------------
+ f
+(1 row)
+
 -- equality
 select '1'::xid = '1'::xid;
  ?column? 
@@ -160,11 +190,11 @@ select xid8cmp('1', '2'), xid8cmp('2', '2'), xid8cmp('2', '1');
 
 -- min() and max() for xid8
 create table xid8_t1 (x xid8);
-insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xffffffffffffffff'), ('-1');
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xfffffffeffffffff');
 select min(x), max(x) from xid8_t1;
  min |         max          
 -----+----------------------
-   0 | 18446744073709551615
+   0 | 18446744069414584319
 (1 row)
 
 -- xid8 has btree and hash opclasses
diff --git a/src/test/regress/sql/xid.sql b/src/test/regress/sql/xid.sql
index 9f716b3653a..ed94c6d7d08 100644
--- a/src/test/regress/sql/xid.sql
+++ b/src/test/regress/sql/xid.sql
@@ -7,14 +7,17 @@ select '010'::xid,
        '-1'::xid,
 	   '010'::xid8,
 	   '42'::xid8,
-	   '0xffffffffffffffff'::xid8,
-	   '-1'::xid8;
+	   '0xfffffffeffffffff'::xid8,
+	   '0'::xid8;
 
 -- garbage values
 select ''::xid;
 select 'asdf'::xid;
 select ''::xid8;
 select 'asdf'::xid8;
+select '-1'::xid8;
+select '0xffffffffffffffff'::xid8;
+select '0x0000300000000001'::xid8;
 
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('42', 'xid');
@@ -23,6 +26,9 @@ SELECT * FROM pg_input_error_info('0xffffffffff', 'xid');
 SELECT pg_input_is_valid('42', 'xid8');
 SELECT pg_input_is_valid('asdf', 'xid8');
 SELECT * FROM pg_input_error_info('0xffffffffffffffffffff', 'xid8');
+SELECT pg_input_is_valid('-1', 'xid8');
+SELECT pg_input_is_valid('0xffffffffffffffff', 'xid8');
+SELECT pg_input_is_valid('0x0000300000000001', 'xid8');
 
 -- equality
 select '1'::xid = '1'::xid;
@@ -51,7 +57,7 @@ select xid8cmp('1', '2'), xid8cmp('2', '2'), xid8cmp('2', '1');
 
 -- min() and max() for xid8
 create table xid8_t1 (x xid8);
-insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xffffffffffffffff'), ('-1');
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0xfffffffeffffffff');
 select min(x), max(x) from xid8_t1;
 
 -- xid8 has btree and hash opclasses
-- 
2.38.0

#31Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#30)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Mar 8, 2023, at 4:15 PM, Andres Freund <andres@anarazel.de> wrote:

I worked some more on the fixes for amcheck, and fixes for amcheck.

The second amcheck fix ends up correcting some inaccurate output in the tests
- previously xids from before xid 0 were reported to be in the future.

Previously there was no test case exercising exceeding nextxid, without
wrapping around into the past. I added that at the end of
004_verify_heapam.pl, because renumbering seemed too annoying.

What do you think?

The changes look reasonable to me.

Somewhat random note:

Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's
effectively O(ROWCOUNT^2), albeit with small enough constants to not really
matter. I don't think we need to insert the rows one-by-one either. Changing
that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't
change that, but we also fire off a psql for each tuple for heap_page_items(),
with offset $N no less. That seems to be another 500ms.

I don't recall the reasoning. Feel free to optimize the tests.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#31)
1 attachment(s)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-03-09 12:15:16 -0800, Mark Dilger wrote:

Somewhat random note:

Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's
effectively O(ROWCOUNT^2), albeit with small enough constants to not really
matter. I don't think we need to insert the rows one-by-one either. Changing
that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't
change that, but we also fire off a psql for each tuple for heap_page_items(),
with offset $N no less. That seems to be another 500ms.

I don't recall the reasoning. Feel free to optimize the tests.

Something like the attached.

I don't know enough perl to know how to interpolate something like
use constant ROWCOUNT => 17;
so I just made it a variable.

Greetings,

Andres Freund

Attachments:

v1-0001-pg_amcheck-Minor-test-speedups.patchtext/x-diff; charset=us-asciiDownload
From a01e1481505e74097112c0ea358e7e0eef6a5684 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 11 Mar 2023 15:15:35 -0800
Subject: [PATCH v1] pg_amcheck: Minor test speedups

Discussion: https://postgr.es/m/20230309001558.b7shzvio645ebdta@awork3.anarazel.de
---
 src/bin/pg_amcheck/t/004_verify_heapam.pl | 33 +++++++++++------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index 9984d0d9f87..e5ae7e6aada 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -217,17 +217,17 @@ my $rel = $node->safe_psql('postgres',
 my $relpath = "$pgdata/$rel";
 
 # Insert data and freeze public.test
-use constant ROWCOUNT => 17;
+my $ROWCOUNT = 17;
 $node->safe_psql(
 	'postgres', qq(
 	INSERT INTO public.test (a, b, c)
-		VALUES (
+		SELECT
 			x'DEADF9F9DEADF9F9'::bigint,
 			'abcdefg',
 			repeat('w', 10000)
-		);
-	VACUUM FREEZE public.test
-	)) for (1 .. ROWCOUNT);
+        FROM generate_series(1, $ROWCOUNT);
+	VACUUM FREEZE public.test;)
+);
 
 my $relfrozenxid = $node->safe_psql('postgres',
 	q(select relfrozenxid from pg_class where relname = 'test'));
@@ -246,16 +246,13 @@ if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
 }
 
 # Find where each of the tuples is located on the page.
-my @lp_off;
-for my $tup (0 .. ROWCOUNT - 1)
-{
-	push(
-		@lp_off,
-		$node->safe_psql(
-			'postgres', qq(
-select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
-	offset $tup limit 1)));
-}
+my @lp_off = split '\n', $node->safe_psql(
+	'postgres', qq(
+	    select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+		where lp <= $ROWCOUNT
+    )
+);
+is(scalar @lp_off, $ROWCOUNT, "acquired row offsets");
 
 # Sanity check that our 'test' table on disk layout matches expectations.  If
 # this is not so, we will have to skip the test until somebody updates the test
@@ -267,7 +264,7 @@ open($file, '+<', $relpath)
 binmode $file;
 
 my $ENDIANNESS;
-for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
 {
 	my $offnum = $tupidx + 1;        # offnum is 1-based, not zero-based
 	my $offset = $lp_off[$tupidx];
@@ -345,7 +342,7 @@ open($file, '+<', $relpath)
   or BAIL_OUT("open failed: $!");
 binmode $file;
 
-for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
 {
 	my $offnum = $tupidx + 1;        # offnum is 1-based, not zero-based
 	my $offset = $lp_off[$tupidx];
@@ -522,7 +519,7 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
 		$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
 
 		push @expected,
-          qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
+		  qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
 	}
 	write_tuple($file, $offset, $tup);
 }
-- 
2.38.0

#33Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#32)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote:

Something like the attached.

I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works.

I don't know enough perl to know how to interpolate something like
use constant ROWCOUNT => 17;
so I just made it a variable.

Seems fair. I certainly don't mind.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#34Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#33)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-03-11 15:34:55 -0800, Mark Dilger wrote:

On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote:

Something like the attached.

I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works.

I did check that, yes :). My process of writing perl is certainly, uh,
iterative. No way I would get anything close to working without testing it.

CI now finished the tests as well:
https://cirrus-ci.com/build/6675457702100992

So I'll go ahead and push that.

Greetings,

Andres Freund

#35Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Andres Freund (#34)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On 12.03.23 00:41, Andres Freund wrote:

Hi,

On 2023-03-11 15:34:55 -0800, Mark Dilger wrote:

On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote:

Something like the attached.

I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works.

I did check that, yes :). My process of writing perl is certainly, uh,
iterative. No way I would get anything close to working without testing it.

CI now finished the tests as well:
https://cirrus-ci.com/build/6675457702100992

So I'll go ahead and push that.

There is a small issue with this commit (a4f23f9b3c).

In src/bin/pg_amcheck/t/004_verify_heapam.pl, there is code to detect
whether the page layout matches expectations and if not it calls plan
skip_all.

This commit adds a test

is(scalar @lp_off, $ROWCOUNT, "acquired row offsets");

*before* that skip_all call. This appears to be invalid. If the
skip_all happens, you get a complaint like

t/004_verify_heapam.pl (Wstat: 0 Tests: 1 Failed: 0)
Parse errors: Bad plan. You planned 0 tests but ran 1.

We could move the is() test after all the skip_all's. Any thoughts?

#36Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#35)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-05-10 17:44:07 +0200, Peter Eisentraut wrote:

On 12.03.23 00:41, Andres Freund wrote:

Hi,

On 2023-03-11 15:34:55 -0800, Mark Dilger wrote:

On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote:

Something like the attached.

I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works.

I did check that, yes :). My process of writing perl is certainly, uh,
iterative. No way I would get anything close to working without testing it.

CI now finished the tests as well:
https://cirrus-ci.com/build/6675457702100992

So I'll go ahead and push that.

There is a small issue with this commit (a4f23f9b3c).

In src/bin/pg_amcheck/t/004_verify_heapam.pl, there is code to detect
whether the page layout matches expectations and if not it calls plan
skip_all.

Some of these skip_all's don't seem like a good idea. Why is a broken
relfrozenxid a cause for skipping a test? But anyway, that's really unrelated
to the topic at hand.

This commit adds a test

is(scalar @lp_off, $ROWCOUNT, "acquired row offsets");

*before* that skip_all call. This appears to be invalid. If the skip_all
happens, you get a complaint like

t/004_verify_heapam.pl (Wstat: 0 Tests: 1 Failed: 0)
Parse errors: Bad plan. You planned 0 tests but ran 1.

We could move the is() test after all the skip_all's. Any thoughts?

I think the easiest fix is to just die if we can't get the offsets - it's not
like we can really continue afterwards...

Greetings,

Andres Freund

#37Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Andres Freund (#36)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

On 10.05.23 20:04, Andres Freund wrote:

This commit adds a test

is(scalar @lp_off, $ROWCOUNT, "acquired row offsets");

*before* that skip_all call. This appears to be invalid. If the skip_all
happens, you get a complaint like

t/004_verify_heapam.pl (Wstat: 0 Tests: 1 Failed: 0)
Parse errors: Bad plan. You planned 0 tests but ran 1.

We could move the is() test after all the skip_all's. Any thoughts?

I think the easiest fix is to just die if we can't get the offsets - it's not
like we can really continue afterwards...

This should do it:

-is(scalar @lp_off, $ROWCOUNT, "acquired row offsets");
+scalar @lp_off == $ROWCOUNT or BAIL_OUT("row offset counts mismatch");

But I'm not sure what the latest thinking on BAIL_OUT is. It is used
nearby in a similar way though.

#38Alexander Lakhin
exclusion@gmail.com
In reply to: Andres Freund (#34)
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hello Andres,

12.03.2023 02:41, Andres Freund wrote:

CI now finished the tests as well:
https://cirrus-ci.com/build/6675457702100992

So I'll go ahead and push that.

As I mentioned at [1]/messages/by-id/72705e42-42d1-ac6e-e7d5-4baec8a0d2af@gmail.com, `meson test` fails on Windows x86 platform during
the test pg_amcheck/004_verify_heapam (I'm using VS 2022 Version 17.9.7):
meson setup build --wipe -Dcassert=true
cd build & ninja & meson test

... postgresql:pg_amcheck / pg_amcheck/004_verify_heapam ERROR             6.95s   exit status 25

004_verify_heapam_test.log contains:
TRAP: failed Assert("FullTransactionIdIsNormal(fxid)"), File: "../contrib/amcheck/verify_heapam.c", Line: 1915, PID: 2560
2024-07-04 20:56:54.592 PDT [9780] LOG:  server process (PID 2560) was terminated by exception 0xC0000409
2024-07-04 20:56:54.592 PDT [9780] DETAIL:  Failed process was running: SELECT v.blkno, v.offnum, v.attnum, v.msg FROM
pg_catalog.pg_class c, "public".verify_heapam(
    relation := c.oid, on_error_stop := false, check_toast := true, skip := 'none'
    ) v WHERE c.oid = 16438 AND c.relpersistence != 't'

`git bisect` for this anomaly pointed at 4f5d461e0.
(I couldn't compile Postgres on that commit, but with
`git show 53ea2b7ad | git apply` (see also [2]/messages/by-id/17967-cd21e34a314141b2@postgresql.org) it's possible.)

The Assert in question is:
    else
        fxid = FullTransactionIdFromU64(nextfxid_i - diff);

    Assert(FullTransactionIdIsNormal(fxid));

It was not clear to me how it comes out that fxid is not normal, until I
looked at the disassembly:
    else
        fxid = FullTransactionIdFromU64(nextfxid_i - diff);
751812D2  sub         ebx,eax
751812D4  sbb         edi,edx

    Assert(FullTransactionIdIsNormal(fxid));
751812D6  jne         FullTransactionIdFromXidAndCtx+0E6h (751812F6h)
751812D8  jb          FullTransactionIdFromXidAndCtx+0CFh (751812DFh)
751812DA  cmp         ebx,3
751812DD  jae         FullTransactionIdFromXidAndCtx+0E6h (751812F6h)
751812DF  push        77Bh
751812E4  push        offset string "../contrib/amcheck/verify_heapa@"... (7518C4A4h)
751812E9  push        offset string "FullTransactionIdIsNormal(fxid)" (7518DB04h)
751812EE  call        _ExceptionalCondition (75189FFEh)

The same code fragment for your convenience:
https://ideone.com/8wiGRY

Could you please look at this?

[1]: /messages/by-id/72705e42-42d1-ac6e-e7d5-4baec8a0d2af@gmail.com
[2]: /messages/by-id/17967-cd21e34a314141b2@postgresql.org

Best regards,
Alexander