pg_upgrade can result in early wraparound on databases with high transaction load

Started by Jason Harveyover 6 years ago21 messages
#1Jason Harvey
jason@reddit.com

Hello,

This week I upgraded one of my large(2.8TB), high-volume databases from 9
to 11. The upgrade itself went fine. About two days later, we unexpectedly
hit transaction ID wraparound. What was perplexing about this was that the
age of our oldest `datfrozenxid` was only 1.2 billion - far away from where
I'd expect a wraparound. Curiously, the wraparound error referred to a
mysterious database of `OID 0`:

UPDATE ERROR: database is not accepting commands to avoid wraparound data
loss in database with OID 0

We were able to recover after a few hours by greatly speeding up our vacuum
on our largest table.

In a followup investigation I uncovered the reason we hit the wraparound so
early, and also the cause of the mysterious OID 0 message. When pg_upgrade
executes, it calls pg_resetwal to set the next transaction ID. Within
pg_resetwal is the following code:
https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450

This sets the controldata to have a fake database (OID 0) on the brink of
transaction wraparound. Specifically, after pg_upgrade is ran, wraparound
will occur within around 140 million transactions (provided the autovacuum
doesn't finish first). I confirmed by analyzing our controldata before and
after the upgrade that this was the cause of our early wraparound.

Given the size and heavy volume of our database, we tend to complete a
vacuum in the time it takes around 250 million transactions to execute.
With our tunings this tends to be rather safe and we stay well away from
the wraparound point under normal circumstances.

Unfortunately we had no obvious way of knowing that the upgrade would place
our database upon the brink of wraparound. In fact, since this info is only
persisted in the controldata, the only way to discover this state to my
knowledge would be to inspect the controldata itself. Other standard means
of monitoring for wraparound risk involve watching `pg_database` or
`pg_class`, which in this case tells us nothing helpful since the fake
database present in the controldata is not represented in those stats.

I'd like to suggest that either the pg_upgrade->pg_resetwal behaviour be
adjusted, or the pg_upgrade documentation highlight this potential
scenario. I'm happy to contribute code and/or documentation pull requests
to accomplish this.

Thank you,
Jason Harvey
reddit.com

In reply to: Jason Harvey (#1)
Re: pg_upgrade can result in early wraparound on databases with high transaction load

On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:

This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:

UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0

We were able to recover after a few hours by greatly speeding up our vacuum on our largest table.

In a followup investigation I uncovered the reason we hit the wraparound so early, and also the cause of the mysterious OID 0 message. When pg_upgrade executes, it calls pg_resetwal to set the next transaction ID. Within pg_resetwal is the following code: https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450

This sets the controldata to have a fake database (OID 0) on the brink of transaction wraparound. Specifically, after pg_upgrade is ran, wraparound will occur within around 140 million transactions (provided the autovacuum doesn't finish first). I confirmed by analyzing our controldata before and after the upgrade that this was the cause of our early wraparound.

Given the size and heavy volume of our database, we tend to complete a vacuum in the time it takes around 250 million transactions to execute. With our tunings this tends to be rather safe and we stay well away from the wraparound point under normal circumstances.

This does seem like an unfriendly behavior. Moving the thread over to
the -hackers list for further discussion...

--
Peter Geoghegan

#3Noah Misch
noah@leadboat.com
In reply to: Peter Geoghegan (#2)
Re: pg_upgrade can result in early wraparound on databases with high transaction load

On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:

On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:

This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:

UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0

That's bad.

We were able to recover after a few hours by greatly speeding up our vacuum on our largest table.

For what it's worth, a quicker workaround is to VACUUM FREEZE any database,
however small. That forces a vac_truncate_clog(), which recomputes the wrap
point from pg_database.datfrozenxid values. This demonstrates the workaround:

--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -248,7 +248,10 @@ case $testhost in
 esac
 pg_dumpall --no-sync -f "$temp_root"/dump2.sql || pg_dumpall2_status=$?
+pg_controldata "${PGDATA}"
+vacuumdb -F template1
 pg_ctl -m fast stop
+pg_controldata "${PGDATA}"

if [ -n "$pg_dumpall2_status" ]; then
echo "pg_dumpall of post-upgrade database cluster failed"

In a followup investigation I uncovered the reason we hit the wraparound so early, and also the cause of the mysterious OID 0 message. When pg_upgrade executes, it calls pg_resetwal to set the next transaction ID. Within pg_resetwal is the following code: https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450

pg_upgrade should set oldestXID to the same value as the source cluster or set
it like vac_truncate_clog() would set it. Today's scheme is usually too
pessimistic, but it can be too optimistic if the source cluster was on the
bring of wrap. Thanks for the report.

#4Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#3)
Re: pg_upgrade can result in early wraparound on databases with high transaction load

Hi,

On 2019-06-15 11:37:59 -0700, Noah Misch wrote:

On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:

On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:

This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:

UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0

That's bad.

Yea. The code triggering it in pg_resetwal is bogus as far as I can
tell. That pg_upgrade triggers it makes this quite bad.

I just hit issues related to it when writing a wraparound handling
test. Peter remembered this issue (how?)...

Especially before 13 (inserts triggering autovacuum) it is quite common
to have tables that only ever get vacuumed due to anti-wraparound
vacuums. And it's common for larger databases to increase
autovacuum_freeze_max_age. Which makes it fairly likely for this to
guess an oldestXid value that's *newer* than an accurate one. Since
oldestXid is used in a few important-ish places (like triggering
vacuums, and in 14 also some snapshot related logic) I think that's bad.

The relevant code:

if (set_xid != 0)
{
ControlFile.checkPointCopy.nextXid =
FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
set_xid);

/*
* For the moment, just set oldestXid to a value that will force
* immediate autovacuum-for-wraparound. It's not clear whether adding
* user control of this is useful, so let's just do something that's
* reasonably safe. The magic constant here corresponds to the
* maximum allowed value of autovacuum_freeze_max_age.
*/
ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
}

Originally from:

commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2009-08-31 02:23:23 +0000

Track the current XID wrap limit (or more accurately, the oldest unfrozen
XID) in checkpoint records. This eliminates the need to recompute the value
from scratch during database startup, which is one of the two remaining
reasons for the flatfile code to exist. It should also simplify life for
hot-standby operation.

I think we should remove the oldestXid guessing logic, and expose it as
an explicit option. I think it's important that pg_upgrade sets an
accurate value. Probably not worth caring about oldestXidDB though?

Greetings,

Andres Freund

#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#4)
Re: pg_upgrade can result in early wraparound on databases with high transaction load

On Fri, Apr 23, 2021 at 04:42:56PM -0700, Andres Freund wrote:

Hi,

On 2019-06-15 11:37:59 -0700, Noah Misch wrote:

On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:

On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:

This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:

UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0

That's bad.

Yea. The code triggering it in pg_resetwal is bogus as far as I can
tell. That pg_upgrade triggers it makes this quite bad.

I just hit issues related to it when writing a wraparound handling
test. Peter remembered this issue (how?)...

Especially before 13 (inserts triggering autovacuum) it is quite common
to have tables that only ever get vacuumed due to anti-wraparound
vacuums. And it's common for larger databases to increase
autovacuum_freeze_max_age. Which makes it fairly likely for this to
guess an oldestXid value that's *newer* than an accurate one. Since
oldestXid is used in a few important-ish places (like triggering
vacuums, and in 14 also some snapshot related logic) I think that's bad.

The relevant code:

if (set_xid != 0)
{
ControlFile.checkPointCopy.nextXid =
FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
set_xid);

/*
* For the moment, just set oldestXid to a value that will force
* immediate autovacuum-for-wraparound. It's not clear whether adding
* user control of this is useful, so let's just do something that's
* reasonably safe. The magic constant here corresponds to the
* maximum allowed value of autovacuum_freeze_max_age.
*/
ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
}

Originally from:

commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2009-08-31 02:23:23 +0000

Track the current XID wrap limit (or more accurately, the oldest unfrozen
XID) in checkpoint records. This eliminates the need to recompute the value
from scratch during database startup, which is one of the two remaining
reasons for the flatfile code to exist. It should also simplify life for
hot-standby operation.

I think we should remove the oldestXid guessing logic, and expose it as
an explicit option. I think it's important that pg_upgrade sets an
accurate value. Probably not worth caring about oldestXidDB though?

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
/messages/by-id/12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com
/messages/by-id/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
/messages/by-id/cc089cc3-fc43-9904-fdba-d830d8222145@enterprisedb.com

Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.

--
Justin

#6Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#5)
Re: pg_upgrade can result in early wraparound on databases with high transaction load

Hi,

On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
/messages/by-id/12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com
/messages/by-id/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
/messages/by-id/cc089cc3-fc43-9904-fdba-d830d8222145@enterprisedb.com

Huh. Thanks for digging these up.

Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.

That seems the entirely the wrong approach to me, buying further into
the broken idea of inventing random wrong values for oldestXid.

We drive important things like the emergency xid limits off oldestXid. On
databases with tables that are older than ~147million xids (i.e. not even
affected by the default autovacuum_freeze_max_age) the current constant leads
to setting the oldestXid to a value *in the future*/wrapped around. Any
different different constant (or pg_upgrade parameter) will do that too in
other scenarios.

As far as I can tell there is precisely *no* correct behaviour here other than
exactly copying the oldestXid limit from the source database.

Greetings,

Andres Freund

#7Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Andres Freund (#6)
1 attachment(s)
Re: pg_upgrade can result in early wraparound on databases with high transaction load

Hi,

On 4/24/21 3:00 AM, Andres Freund wrote:

Hi,

On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
/messages/by-id/12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com
/messages/by-id/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
/messages/by-id/cc089cc3-fc43-9904-fdba-d830d8222145@enterprisedb.com

Huh. Thanks for digging these up.

Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.

That seems the entirely the wrong approach to me, buying further into
the broken idea of inventing random wrong values for oldestXid.

We drive important things like the emergency xid limits off oldestXid. On
databases with tables that are older than ~147million xids (i.e. not even
affected by the default autovacuum_freeze_max_age) the current constant leads
to setting the oldestXid to a value *in the future*/wrapped around. Any
different different constant (or pg_upgrade parameter) will do that too in
other scenarios.

As far as I can tell there is precisely *no* correct behaviour here other than
exactly copying the oldestXid limit from the source database.

Please find attached a patch proposal doing so: it adds a new (- u)
parameter to pg_resetwal that allows to specify the oldest unfrozen XID
to set.
Then this new parameter is being used in pg_upgrade to copy the source
Latest checkpoint's oldestXID.

Questions:

* Should we keep the old behavior in case -x is being used without -u?
(The proposed patch does not set an arbitrary oldestXID anymore in
case -x is used.)
* Also shouldn't we ensure that the xid provided with -x or -u is >=
FirstNormalTransactionId (Currently the only check is that it is # 0)?

I'm adding this patch to the commitfest.

Bertrand

Attachments:

v1-0001-pg-upgrade-keep-oldestxid.patchtext/plain; charset=UTF-8; name=v1-0001-pg-upgrade-keep-oldestxid.patch; x-mac-creator=0; x-mac-type=0Download
 src/bin/pg_resetwal/pg_resetwal.c | 61 ++++++++++++++++++++++-----------------
 src/bin/pg_upgrade/controldata.c  | 17 ++++++++++-
 src/bin/pg_upgrade/pg_upgrade.c   |  6 ++++
 src/bin/pg_upgrade/pg_upgrade.h   |  1 +
 4 files changed, 58 insertions(+), 27 deletions(-)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..66c746de11 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -65,6 +65,7 @@ static bool guessed = false;	/* T if we had to guess at any values */
 static const char *progname;
 static uint32 set_xid_epoch = (uint32) -1;
 static TransactionId set_xid = 0;
+static TransactionId set_oldest_unfrozen_xid = 0;
 static TransactionId set_oldest_commit_ts_xid = 0;
 static TransactionId set_newest_commit_ts_xid = 0;
 static Oid	set_oid = 0;
@@ -102,6 +103,7 @@ main(int argc, char *argv[])
 		{"next-oid", required_argument, NULL, 'o'},
 		{"multixact-offset", required_argument, NULL, 'O'},
 		{"next-transaction-id", required_argument, NULL, 'x'},
+		{"oldest-transaction-id", required_argument, NULL, 'u'},
 		{"wal-segsize", required_argument, NULL, 1},
 		{NULL, 0, NULL, 0}
 	};
@@ -135,7 +137,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:u:", long_options, NULL)) != -1)
 	{
 		switch (c)
 		{
@@ -183,6 +185,21 @@ main(int argc, char *argv[])
 				}
 				break;
 
+			case 'u':
+				set_oldest_unfrozen_xid = strtoul(optarg, &endptr, 0);
+				if (endptr == optarg || *endptr != '\0')
+				{
+					pg_log_error("invalid argument for option %s", "-u");
+					fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+					exit(1);
+				}
+				if (set_oldest_unfrozen_xid == 0)
+				{
+					pg_log_error("oldest unfrozen transaction ID (-u) must not be 0");
+					exit(1);
+				}
+				break;
+
 			case 'c':
 				set_oldest_commit_ts_xid = strtoul(optarg, &endptr, 0);
 				if (endptr == optarg || *endptr != ',')
@@ -429,21 +446,12 @@ main(int argc, char *argv[])
 											 XidFromFullTransactionId(ControlFile.checkPointCopy.nextXid));
 
 	if (set_xid != 0)
-	{
 		ControlFile.checkPointCopy.nextXid =
 			FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
 											 set_xid);
 
-		/*
-		 * For the moment, just set oldestXid to a value that will force
-		 * immediate autovacuum-for-wraparound.  It's not clear whether adding
-		 * user control of this is useful, so let's just do something that's
-		 * reasonably safe.  The magic constant here corresponds to the
-		 * maximum allowed value of autovacuum_freeze_max_age.
-		 */
-		ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
-		if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
-			ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
+	if (set_oldest_unfrozen_xid != 0) {
+		ControlFile.checkPointCopy.oldestXid = set_oldest_unfrozen_xid;
 		ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
 	}
 
@@ -1209,20 +1217,21 @@ usage(void)
 	printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
 	printf(_("Options:\n"));
 	printf(_("  -c, --commit-timestamp-ids=XID,XID\n"
-			 "                                 set oldest and newest transactions bearing\n"
-			 "                                 commit timestamp (zero means no change)\n"));
-	printf(_(" [-D, --pgdata=]DATADIR          data directory\n"));
-	printf(_("  -e, --epoch=XIDEPOCH           set next transaction ID epoch\n"));
-	printf(_("  -f, --force                    force update to be done\n"));
-	printf(_("  -l, --next-wal-file=WALFILE    set minimum starting location for new WAL\n"));
-	printf(_("  -m, --multixact-ids=MXID,MXID  set next and oldest multitransaction ID\n"));
-	printf(_("  -n, --dry-run                  no update, just show what would be done\n"));
-	printf(_("  -o, --next-oid=OID             set next OID\n"));
-	printf(_("  -O, --multixact-offset=OFFSET  set next multitransaction offset\n"));
-	printf(_("  -V, --version                  output version information, then exit\n"));
-	printf(_("  -x, --next-transaction-id=XID  set next transaction ID\n"));
-	printf(_("      --wal-segsize=SIZE         size of WAL segments, in megabytes\n"));
-	printf(_("  -?, --help                     show this help, then exit\n"));
+			 "                                   set oldest and newest transactions bearing\n"
+			 "                                   commit timestamp (zero means no change)\n"));
+	printf(_(" [-D, --pgdata=]DATADIR            data directory\n"));
+	printf(_("  -e, --epoch=XIDEPOCH             set next transaction ID epoch\n"));
+	printf(_("  -f, --force                      force update to be done\n"));
+	printf(_("  -l, --next-wal-file=WALFILE      set minimum starting location for new WAL\n"));
+	printf(_("  -m, --multixact-ids=MXID,MXID    set next and oldest multitransaction ID\n"));
+	printf(_("  -n, --dry-run                    no update, just show what would be done\n"));
+	printf(_("  -o, --next-oid=OID               set next OID\n"));
+	printf(_("  -O, --multixact-offset=OFFSET    set next multitransaction offset\n"));
+	printf(_("  -u, --oldest-transaction-id=XID  set oldest unfrozen transaction ID\n"));
+	printf(_("  -V, --version                    output version information, then exit\n"));
+	printf(_("  -x, --next-transaction-id=XID    set next transaction ID\n"));
+	printf(_("      --wal-segsize=SIZE           size of WAL segments, in megabytes\n"));
+	printf(_("  -?, --help                       show this help, then exit\n"));
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
 	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
 }
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..a4b6375403 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -44,6 +44,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 	bool		got_oid = false;
 	bool		got_multi = false;
 	bool		got_oldestmulti = false;
+	bool		got_oldestxid = false;
 	bool		got_mxoff = false;
 	bool		got_nextxlogfile = false;
 	bool		got_float8_pass_by_value = false;
@@ -312,6 +313,17 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 			cluster->controldata.chkpnt_nxtmulti = str2uint(p);
 			got_multi = true;
 		}
+		else if ((p = strstr(bufin, "Latest checkpoint's oldestXID:")) != NULL)
+		{
+			p = strchr(p, ':');
+
+			if (p == NULL || strlen(p) <= 1)
+				pg_fatal("%d: controldata retrieval problem\n", __LINE__);
+
+			p++;				/* remove ':' char */
+			cluster->controldata.chkpnt_oldstxid = str2uint(p);
+			got_oldestxid = true;
+		}
 		else if ((p = strstr(bufin, "Latest checkpoint's oldestMultiXid:")) != NULL)
 		{
 			p = strchr(p, ':');
@@ -544,7 +556,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 
 	/* verify that we got all the mandatory pg_control data */
 	if (!got_xid || !got_oid ||
-		!got_multi ||
+		!got_multi || !got_oldestxid ||
 		(!got_oldestmulti &&
 		 cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER) ||
 		!got_mxoff || (!live_check && !got_nextxlogfile) ||
@@ -575,6 +587,9 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 			cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
 			pg_log(PG_REPORT, "  latest checkpoint oldest MultiXactId\n");
 
+		if (!got_oldestxid)
+			pg_log(PG_REPORT, "  latest checkpoint oldestXID\n");
+
 		if (!got_mxoff)
 			pg_log(PG_REPORT, "  latest checkpoint next MultiXactOffset\n");
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca88d..950ff24980 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -473,6 +473,12 @@ copy_xact_xlog_xid(void)
 			  "\"%s/pg_resetwal\" -f -x %u \"%s\"",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
 			  new_cluster.pgdata);
+	check_ok();
+	prep_status("Setting oldest XID for new cluster");
+	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
+			  "\"%s/pg_resetwal\" -f -u %u \"%s\"",
+			  new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid,
+			  new_cluster.pgdata);
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_resetwal\" -f -e %u \"%s\"",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index d7666da3f2..b775ba31e5 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -207,6 +207,7 @@ typedef struct
 	uint32		chkpnt_nxtmulti;
 	uint32		chkpnt_nxtmxoff;
 	uint32		chkpnt_oldstMulti;
+	uint32		chkpnt_oldstxid;
 	uint32		align;
 	uint32		blocksz;
 	uint32		largesz;
#8Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Drouvot, Bertrand (#7)
1 attachment(s)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

Hi,

On 5/4/21 10:17 AM, Drouvot, Bertrand wrote:

Hi,

On 4/24/21 3:00 AM, Andres Freund wrote:

Hi,

On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
/messages/by-id/12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com
/messages/by-id/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
/messages/by-id/cc089cc3-fc43-9904-fdba-d830d8222145@enterprisedb.com

Huh. Thanks for digging these up.

Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.

That seems the entirely the wrong approach to me, buying further into
the broken idea of inventing random wrong values for oldestXid.

We drive important things like the emergency xid limits off oldestXid. On
databases with tables that are older than ~147million xids (i.e. not even
affected by the default autovacuum_freeze_max_age) the current constant leads
to setting the oldestXid to a value *in the future*/wrapped around. Any
different different constant (or pg_upgrade parameter) will do that too in
other scenarios.

As far as I can tell there is precisely *no* correct behaviour here other than
exactly copying the oldestXid limit from the source database.

Please find attached a patch proposal doing so: it adds a new (- u)
parameter to pg_resetwal that allows to specify the oldest unfrozen
XID to set.
Then this new parameter is being used in pg_upgrade to copy the source
Latest checkpoint's oldestXID.

Questions:

* Should we keep the old behavior in case -x is being used without
-u? (The proposed patch does not set an arbitrary oldestXID
anymore in case -x is used.)
* Also shouldn't we ensure that the xid provided with -x or -u is >=
FirstNormalTransactionId (Currently the only check is that it is # 0)?

Copy/pasting Andres feedback (Thanks Andres for this feedback) on those
questions from another thread [1]/messages/by-id/20210517185646.pwe4klaufwmdhe2a@alap3.anarazel.de.

I was also wondering if:

* We should keep the old behavior in case pg_resetwal -x is being used
without -u?
 (The proposed patch does not set an arbitrary oldestXID
anymore in 
case -x is used)

Andres: I don't think we should. I don't see anything in the old
behaviour worth
maintaining.

* We should ensure that the xid provided with -x or -u is

=
FirstNormalTransactionId (Currently the only check is that it is

# 0)?

Andres: Applying TransactionIdIsNormal() seems like a good idea.

=> I am attaching a new version that makes use of
TransactionIdIsNormal() checks.

Andres: I think it's important to verify that the xid provided with -x
is within a reasonable range of the oldest xid.

=> What do you mean by "a reasonable range"?

Thanks

Bertrand

[1]: /messages/by-id/20210517185646.pwe4klaufwmdhe2a@alap3.anarazel.de
/messages/by-id/20210517185646.pwe4klaufwmdhe2a@alap3.anarazel.de

Attachments:

v1-0002-pg-upgrade-keep-oldestxid.patchtext/plain; charset=UTF-8; name=v1-0002-pg-upgrade-keep-oldestxid.patch; x-mac-creator=0; x-mac-type=0Download
 src/bin/pg_resetwal/pg_resetwal.c | 65 ++++++++++++++++++++++-----------------
 src/bin/pg_upgrade/controldata.c  | 17 +++++++++-
 src/bin/pg_upgrade/pg_upgrade.c   |  6 ++++
 src/bin/pg_upgrade/pg_upgrade.h   |  1 +
 4 files changed, 60 insertions(+), 29 deletions(-)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..5e864760ed 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -65,6 +65,7 @@ static bool guessed = false;	/* T if we had to guess at any values */
 static const char *progname;
 static uint32 set_xid_epoch = (uint32) -1;
 static TransactionId set_xid = 0;
+static TransactionId set_oldest_unfrozen_xid = 0;
 static TransactionId set_oldest_commit_ts_xid = 0;
 static TransactionId set_newest_commit_ts_xid = 0;
 static Oid	set_oid = 0;
@@ -102,6 +103,7 @@ main(int argc, char *argv[])
 		{"next-oid", required_argument, NULL, 'o'},
 		{"multixact-offset", required_argument, NULL, 'O'},
 		{"next-transaction-id", required_argument, NULL, 'x'},
+		{"oldest-transaction-id", required_argument, NULL, 'u'},
 		{"wal-segsize", required_argument, NULL, 1},
 		{NULL, 0, NULL, 0}
 	};
@@ -135,7 +137,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:u:", long_options, NULL)) != -1)
 	{
 		switch (c)
 		{
@@ -176,9 +178,24 @@ main(int argc, char *argv[])
 					fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 					exit(1);
 				}
-				if (set_xid == 0)
+				if (!TransactionIdIsNormal(set_xid))
 				{
-					pg_log_error("transaction ID (-x) must not be 0");
+					pg_log_error("transaction ID (-x) must be greater or equal to %u", FirstNormalTransactionId);
+					exit(1);
+				}
+				break;
+
+			case 'u':
+				set_oldest_unfrozen_xid = strtoul(optarg, &endptr, 0);
+				if (endptr == optarg || *endptr != '\0')
+				{
+					pg_log_error("invalid argument for option %s", "-u");
+					fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+					exit(1);
+				}
+				if (!TransactionIdIsNormal(set_oldest_unfrozen_xid))
+				{
+					pg_log_error("oldest unfrozen transaction ID (-u) must be greater or equal to %u", FirstNormalTransactionId);
 					exit(1);
 				}
 				break;
@@ -429,21 +446,12 @@ main(int argc, char *argv[])
 											 XidFromFullTransactionId(ControlFile.checkPointCopy.nextXid));
 
 	if (set_xid != 0)
-	{
 		ControlFile.checkPointCopy.nextXid =
 			FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
 											 set_xid);
 
-		/*
-		 * For the moment, just set oldestXid to a value that will force
-		 * immediate autovacuum-for-wraparound.  It's not clear whether adding
-		 * user control of this is useful, so let's just do something that's
-		 * reasonably safe.  The magic constant here corresponds to the
-		 * maximum allowed value of autovacuum_freeze_max_age.
-		 */
-		ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
-		if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
-			ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
+	if (set_oldest_unfrozen_xid != 0) {
+		ControlFile.checkPointCopy.oldestXid = set_oldest_unfrozen_xid;
 		ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
 	}
 
@@ -1209,20 +1217,21 @@ usage(void)
 	printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
 	printf(_("Options:\n"));
 	printf(_("  -c, --commit-timestamp-ids=XID,XID\n"
-			 "                                 set oldest and newest transactions bearing\n"
-			 "                                 commit timestamp (zero means no change)\n"));
-	printf(_(" [-D, --pgdata=]DATADIR          data directory\n"));
-	printf(_("  -e, --epoch=XIDEPOCH           set next transaction ID epoch\n"));
-	printf(_("  -f, --force                    force update to be done\n"));
-	printf(_("  -l, --next-wal-file=WALFILE    set minimum starting location for new WAL\n"));
-	printf(_("  -m, --multixact-ids=MXID,MXID  set next and oldest multitransaction ID\n"));
-	printf(_("  -n, --dry-run                  no update, just show what would be done\n"));
-	printf(_("  -o, --next-oid=OID             set next OID\n"));
-	printf(_("  -O, --multixact-offset=OFFSET  set next multitransaction offset\n"));
-	printf(_("  -V, --version                  output version information, then exit\n"));
-	printf(_("  -x, --next-transaction-id=XID  set next transaction ID\n"));
-	printf(_("      --wal-segsize=SIZE         size of WAL segments, in megabytes\n"));
-	printf(_("  -?, --help                     show this help, then exit\n"));
+			 "                                   set oldest and newest transactions bearing\n"
+			 "                                   commit timestamp (zero means no change)\n"));
+	printf(_(" [-D, --pgdata=]DATADIR            data directory\n"));
+	printf(_("  -e, --epoch=XIDEPOCH             set next transaction ID epoch\n"));
+	printf(_("  -f, --force                      force update to be done\n"));
+	printf(_("  -l, --next-wal-file=WALFILE      set minimum starting location for new WAL\n"));
+	printf(_("  -m, --multixact-ids=MXID,MXID    set next and oldest multitransaction ID\n"));
+	printf(_("  -n, --dry-run                    no update, just show what would be done\n"));
+	printf(_("  -o, --next-oid=OID               set next OID\n"));
+	printf(_("  -O, --multixact-offset=OFFSET    set next multitransaction offset\n"));
+	printf(_("  -u, --oldest-transaction-id=XID  set oldest unfrozen transaction ID\n"));
+	printf(_("  -V, --version                    output version information, then exit\n"));
+	printf(_("  -x, --next-transaction-id=XID    set next transaction ID\n"));
+	printf(_("      --wal-segsize=SIZE           size of WAL segments, in megabytes\n"));
+	printf(_("  -?, --help                       show this help, then exit\n"));
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
 	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
 }
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..a4b6375403 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -44,6 +44,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 	bool		got_oid = false;
 	bool		got_multi = false;
 	bool		got_oldestmulti = false;
+	bool		got_oldestxid = false;
 	bool		got_mxoff = false;
 	bool		got_nextxlogfile = false;
 	bool		got_float8_pass_by_value = false;
@@ -312,6 +313,17 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 			cluster->controldata.chkpnt_nxtmulti = str2uint(p);
 			got_multi = true;
 		}
+		else if ((p = strstr(bufin, "Latest checkpoint's oldestXID:")) != NULL)
+		{
+			p = strchr(p, ':');
+
+			if (p == NULL || strlen(p) <= 1)
+				pg_fatal("%d: controldata retrieval problem\n", __LINE__);
+
+			p++;				/* remove ':' char */
+			cluster->controldata.chkpnt_oldstxid = str2uint(p);
+			got_oldestxid = true;
+		}
 		else if ((p = strstr(bufin, "Latest checkpoint's oldestMultiXid:")) != NULL)
 		{
 			p = strchr(p, ':');
@@ -544,7 +556,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 
 	/* verify that we got all the mandatory pg_control data */
 	if (!got_xid || !got_oid ||
-		!got_multi ||
+		!got_multi || !got_oldestxid ||
 		(!got_oldestmulti &&
 		 cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER) ||
 		!got_mxoff || (!live_check && !got_nextxlogfile) ||
@@ -575,6 +587,9 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 			cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
 			pg_log(PG_REPORT, "  latest checkpoint oldest MultiXactId\n");
 
+		if (!got_oldestxid)
+			pg_log(PG_REPORT, "  latest checkpoint oldestXID\n");
+
 		if (!got_mxoff)
 			pg_log(PG_REPORT, "  latest checkpoint next MultiXactOffset\n");
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca88d..950ff24980 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -473,6 +473,12 @@ copy_xact_xlog_xid(void)
 			  "\"%s/pg_resetwal\" -f -x %u \"%s\"",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
 			  new_cluster.pgdata);
+	check_ok();
+	prep_status("Setting oldest XID for new cluster");
+	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
+			  "\"%s/pg_resetwal\" -f -u %u \"%s\"",
+			  new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid,
+			  new_cluster.pgdata);
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_resetwal\" -f -e %u \"%s\"",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index a5f71c5294..dd0204902c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -207,6 +207,7 @@ typedef struct
 	uint32		chkpnt_nxtmulti;
 	uint32		chkpnt_nxtmxoff;
 	uint32		chkpnt_oldstMulti;
+	uint32		chkpnt_oldstxid;
 	uint32		align;
 	uint32		blocksz;
 	uint32		largesz;
#9Bruce Momjian
bruce@momjian.us
In reply to: Drouvot, Bertrand (#8)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

This patch has been applied back to 9.6 and will appear in the next
minor release.

---------------------------------------------------------------------------

On Tue, May 18, 2021 at 01:26:38PM +0200, Drouvot, Bertrand wrote:

Hi,

On 5/4/21 10:17 AM, Drouvot, Bertrand wrote:

Hi,

On 4/24/21 3:00 AM, Andres Freund wrote:

Hi,

On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
/messages/by-id/12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com
/messages/by-id/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
/messages/by-id/cc089cc3-fc43-9904-fdba-d830d8222145@enterprisedb.com

Huh. Thanks for digging these up.

Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.

That seems the entirely the wrong approach to me, buying further into
the broken idea of inventing random wrong values for oldestXid.

We drive important things like the emergency xid limits off oldestXid. On
databases with tables that are older than ~147million xids (i.e. not even
affected by the default autovacuum_freeze_max_age) the current constant leads
to setting the oldestXid to a value *in the future*/wrapped around. Any
different different constant (or pg_upgrade parameter) will do that too in
other scenarios.

As far as I can tell there is precisely *no* correct behaviour here other than
exactly copying the oldestXid limit from the source database.

Please find attached a patch proposal doing so: it adds a new (- u)
parameter to pg_resetwal that allows to specify the oldest unfrozen XID to
set.
Then this new parameter is being used in pg_upgrade to copy the source
Latest checkpoint's oldestXID.

Questions:

□ Should we keep the old behavior in case -x is being used without -u?
(The proposed patch does not set an arbitrary oldestXID anymore in case
-x is used.)
□ Also shouldn't we ensure that the xid provided with -x or -u is >=
FirstNormalTransactionId (Currently the only check is that it is # 0)?

Copy/pasting Andres feedback (Thanks Andres for this feedback) on those
questions from another thread [1].

I was also wondering if:

* We should keep the old behavior in case pg_resetwal -x is being used
without -u?
 (The proposed patch does not set an arbitrary oldestXID
anymore in 
case -x is used)

Andres: I don't think we should. I don't see anything in the old behaviour
worth
maintaining.

* We should ensure that the xid provided with -x or -u is

=
FirstNormalTransactionId (Currently the only check is that it is

# 0)?

Andres: Applying TransactionIdIsNormal() seems like a good idea.

=> I am attaching a new version that makes use of TransactionIdIsNormal()
checks.

Andres: I think it's important to verify that the xid provided with -x is
within a reasonable range of the oldest xid.

=> What do you mean by "a reasonable range"?

Thanks

Bertrand

[1]: https://www.postgresql.org/message-id/
20210517185646.pwe4klaufwmdhe2a%40alap3.anarazel.de

src/bin/pg_resetwal/pg_resetwal.c | 65 ++++++++++++++++++++++-----------------
src/bin/pg_upgrade/controldata.c  | 17 +++++++++-
src/bin/pg_upgrade/pg_upgrade.c   |  6 ++++
src/bin/pg_upgrade/pg_upgrade.h   |  1 +
4 files changed, 60 insertions(+), 29 deletions(-)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..5e864760ed 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -65,6 +65,7 @@ static bool guessed = false;	/* T if we had to guess at any values */
static const char *progname;
static uint32 set_xid_epoch = (uint32) -1;
static TransactionId set_xid = 0;
+static TransactionId set_oldest_unfrozen_xid = 0;
static TransactionId set_oldest_commit_ts_xid = 0;
static TransactionId set_newest_commit_ts_xid = 0;
static Oid	set_oid = 0;
@@ -102,6 +103,7 @@ main(int argc, char *argv[])
{"next-oid", required_argument, NULL, 'o'},
{"multixact-offset", required_argument, NULL, 'O'},
{"next-transaction-id", required_argument, NULL, 'x'},
+		{"oldest-transaction-id", required_argument, NULL, 'u'},
{"wal-segsize", required_argument, NULL, 1},
{NULL, 0, NULL, 0}
};
@@ -135,7 +137,7 @@ main(int argc, char *argv[])
}
-	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:u:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -176,9 +178,24 @@ main(int argc, char *argv[])
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
}
-				if (set_xid == 0)
+				if (!TransactionIdIsNormal(set_xid))
{
-					pg_log_error("transaction ID (-x) must not be 0");
+					pg_log_error("transaction ID (-x) must be greater or equal to %u", FirstNormalTransactionId);
+					exit(1);
+				}
+				break;
+
+			case 'u':
+				set_oldest_unfrozen_xid = strtoul(optarg, &endptr, 0);
+				if (endptr == optarg || *endptr != '\0')
+				{
+					pg_log_error("invalid argument for option %s", "-u");
+					fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+					exit(1);
+				}
+				if (!TransactionIdIsNormal(set_oldest_unfrozen_xid))
+				{
+					pg_log_error("oldest unfrozen transaction ID (-u) must be greater or equal to %u", FirstNormalTransactionId);
exit(1);
}
break;
@@ -429,21 +446,12 @@ main(int argc, char *argv[])
XidFromFullTransactionId(ControlFile.checkPointCopy.nextXid));

if (set_xid != 0)
- {
ControlFile.checkPointCopy.nextXid =
FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
set_xid);

-		/*
-		 * For the moment, just set oldestXid to a value that will force
-		 * immediate autovacuum-for-wraparound.  It's not clear whether adding
-		 * user control of this is useful, so let's just do something that's
-		 * reasonably safe.  The magic constant here corresponds to the
-		 * maximum allowed value of autovacuum_freeze_max_age.
-		 */
-		ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
-		if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
-			ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
+	if (set_oldest_unfrozen_xid != 0) {
+		ControlFile.checkPointCopy.oldestXid = set_oldest_unfrozen_xid;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
}
@@ -1209,20 +1217,21 @@ usage(void)
printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
printf(_("Options:\n"));
printf(_("  -c, --commit-timestamp-ids=XID,XID\n"
-			 "                                 set oldest and newest transactions bearing\n"
-			 "                                 commit timestamp (zero means no change)\n"));
-	printf(_(" [-D, --pgdata=]DATADIR          data directory\n"));
-	printf(_("  -e, --epoch=XIDEPOCH           set next transaction ID epoch\n"));
-	printf(_("  -f, --force                    force update to be done\n"));
-	printf(_("  -l, --next-wal-file=WALFILE    set minimum starting location for new WAL\n"));
-	printf(_("  -m, --multixact-ids=MXID,MXID  set next and oldest multitransaction ID\n"));
-	printf(_("  -n, --dry-run                  no update, just show what would be done\n"));
-	printf(_("  -o, --next-oid=OID             set next OID\n"));
-	printf(_("  -O, --multixact-offset=OFFSET  set next multitransaction offset\n"));
-	printf(_("  -V, --version                  output version information, then exit\n"));
-	printf(_("  -x, --next-transaction-id=XID  set next transaction ID\n"));
-	printf(_("      --wal-segsize=SIZE         size of WAL segments, in megabytes\n"));
-	printf(_("  -?, --help                     show this help, then exit\n"));
+			 "                                   set oldest and newest transactions bearing\n"
+			 "                                   commit timestamp (zero means no change)\n"));
+	printf(_(" [-D, --pgdata=]DATADIR            data directory\n"));
+	printf(_("  -e, --epoch=XIDEPOCH             set next transaction ID epoch\n"));
+	printf(_("  -f, --force                      force update to be done\n"));
+	printf(_("  -l, --next-wal-file=WALFILE      set minimum starting location for new WAL\n"));
+	printf(_("  -m, --multixact-ids=MXID,MXID    set next and oldest multitransaction ID\n"));
+	printf(_("  -n, --dry-run                    no update, just show what would be done\n"));
+	printf(_("  -o, --next-oid=OID               set next OID\n"));
+	printf(_("  -O, --multixact-offset=OFFSET    set next multitransaction offset\n"));
+	printf(_("  -u, --oldest-transaction-id=XID  set oldest unfrozen transaction ID\n"));
+	printf(_("  -V, --version                    output version information, then exit\n"));
+	printf(_("  -x, --next-transaction-id=XID    set next transaction ID\n"));
+	printf(_("      --wal-segsize=SIZE           size of WAL segments, in megabytes\n"));
+	printf(_("  -?, --help                       show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..a4b6375403 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -44,6 +44,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
bool		got_oid = false;
bool		got_multi = false;
bool		got_oldestmulti = false;
+	bool		got_oldestxid = false;
bool		got_mxoff = false;
bool		got_nextxlogfile = false;
bool		got_float8_pass_by_value = false;
@@ -312,6 +313,17 @@ get_control_data(ClusterInfo *cluster, bool live_check)
cluster->controldata.chkpnt_nxtmulti = str2uint(p);
got_multi = true;
}
+		else if ((p = strstr(bufin, "Latest checkpoint's oldestXID:")) != NULL)
+		{
+			p = strchr(p, ':');
+
+			if (p == NULL || strlen(p) <= 1)
+				pg_fatal("%d: controldata retrieval problem\n", __LINE__);
+
+			p++;				/* remove ':' char */
+			cluster->controldata.chkpnt_oldstxid = str2uint(p);
+			got_oldestxid = true;
+		}
else if ((p = strstr(bufin, "Latest checkpoint's oldestMultiXid:")) != NULL)
{
p = strchr(p, ':');
@@ -544,7 +556,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)

/* verify that we got all the mandatory pg_control data */
if (!got_xid || !got_oid ||
- !got_multi ||
+ !got_multi || !got_oldestxid ||
(!got_oldestmulti &&
cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER) ||
!got_mxoff || (!live_check && !got_nextxlogfile) ||
@@ -575,6 +587,9 @@ get_control_data(ClusterInfo *cluster, bool live_check)
cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
pg_log(PG_REPORT, " latest checkpoint oldest MultiXactId\n");

+		if (!got_oldestxid)
+			pg_log(PG_REPORT, "  latest checkpoint oldestXID\n");
+
if (!got_mxoff)
pg_log(PG_REPORT, "  latest checkpoint next MultiXactOffset\n");
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca88d..950ff24980 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -473,6 +473,12 @@ copy_xact_xlog_xid(void)
"\"%s/pg_resetwal\" -f -x %u \"%s\"",
new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
new_cluster.pgdata);
+	check_ok();
+	prep_status("Setting oldest XID for new cluster");
+	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
+			  "\"%s/pg_resetwal\" -f -u %u \"%s\"",
+			  new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid,
+			  new_cluster.pgdata);
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
"\"%s/pg_resetwal\" -f -e %u \"%s\"",
new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index a5f71c5294..dd0204902c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -207,6 +207,7 @@ typedef struct
uint32		chkpnt_nxtmulti;
uint32		chkpnt_nxtmxoff;
uint32		chkpnt_oldstMulti;
+	uint32		chkpnt_oldstxid;
uint32		align;
uint32		blocksz;
uint32		largesz;

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#10Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Bruce Momjian (#9)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

Hi,

On 7/27/21 4:39 AM, Bruce Momjian wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

This patch has been applied back to 9.6 and will appear in the next
minor release.

Thank you!

Bertrand

#11Bruce Momjian
bruce@momjian.us
In reply to: Drouvot, Bertrand (#10)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

On Tue, Jul 27, 2021 at 09:25:22AM +0200, Drouvot, Bertrand wrote:

Hi,

On 7/27/21 4:39 AM, Bruce Momjian wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

This patch has been applied back to 9.6 and will appear in the next
minor release.

Thank you!

Thank you for the patch --- this was a tricky problem, and frankly, I am
disappointed that we (and I) took so long to address this.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

Bruce Momjian <bruce@momjian.us> writes:

This patch has been applied back to 9.6 and will appear in the next
minor release.

I have just discovered that this patch broke pg_upgrade's ability
to upgrade from 8.4:

$ pg_upgrade -b ~/version84/bin -d ...
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
The source cluster lacks some required control information:
latest checkpoint oldestXID

Cannot continue without required control information, terminating
Failure, exiting

Sure enough, 8.4's pg_controldata doesn't print anything about
oldestXID, because that info wasn't there then.

Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability. But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

regards, tom lane

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#12)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

On 2022-07-05 Tu 12:59, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

This patch has been applied back to 9.6 and will appear in the next
minor release.

I have just discovered that this patch broke pg_upgrade's ability
to upgrade from 8.4:

$ pg_upgrade -b ~/version84/bin -d ...
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
The source cluster lacks some required control information:
latest checkpoint oldestXID

Cannot continue without required control information, terminating
Failure, exiting

Sure enough, 8.4's pg_controldata doesn't print anything about
oldestXID, because that info wasn't there then.

Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability. But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

So it's taken us a year to discover the issue :-( Perhaps if we're going
to say we support upgrades back to 9.0 we should have some testing to be
assured we don't break it without knowing like this. I'll see if I can
coax crake to do that - it already tests back to 9.2.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#13)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

Andrew Dunstan <andrew@dunslane.net> writes:

So it's taken us a year to discover the issue :-( Perhaps if we're going
to say we support upgrades back to 9.0 we should have some testing to be
assured we don't break it without knowing like this. I'll see if I can
coax crake to do that - it already tests back to 9.2.

Hmm ... could you first look into why 09878cdd4 broke it? I'd supposed
that that was just detecting situations we must already have dealt with
in order for the pg_upgrade test to work, but crake's not happy.

regards, tom lane

In reply to: Andrew Dunstan (#13)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

On Tue, Jul 5, 2022 at 11:53 AM Andrew Dunstan <andrew@dunslane.net> wrote:

Sure enough, 8.4's pg_controldata doesn't print anything about
oldestXID, because that info wasn't there then.

Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability. But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

So it's taken us a year to discover the issue :-(

I'm not surprised at all, given the history here. There were at least
a couple of bugs affecting how pg_upgrade carries forward information
about these cutoffs. See commits 74cf7d46 and a61daa14.

Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u
argument was first added, for use by pg_upgrade. That commit is only
about a year old, and was only backpatched to 9.6. Unfortunately the
previous approach to carrying forward oldestXID was an accident that
usually worked. So...yeah, things are bad here. At least we now have
the ability to detect any downstream problems that this might cause by
using pg_amcheck.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#15)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

On Tue, Jul 5, 2022 at 12:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u
argument was first added, for use by pg_upgrade. That commit is only
about a year old, and was only backpatched to 9.6.

I just realized that this thread was where that work was first
discussed. That explains why it took a year to discover that we broke
8.4!

On further reflection I think that breaking pg_upgrade for 8.4 might
have been a good thing. The issue was fairly visible and obvious if
you actually ran into it, which is vastly preferable to what would
have happened before commit 74cf7d46.

--
Peter Geoghegan

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

On 2022-07-05 Tu 15:17, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

So it's taken us a year to discover the issue :-( Perhaps if we're going
to say we support upgrades back to 9.0 we should have some testing to be
assured we don't break it without knowing like this. I'll see if I can
coax crake to do that - it already tests back to 9.2.

Hmm ... could you first look into why 09878cdd4 broke it? I'd supposed
that that was just detecting situations we must already have dealt with
in order for the pg_upgrade test to work, but crake's not happy.

It's complaining about this:

andrew@emma:HEAD $ cat
./inst/REL9_6_STABLE-20220705T160820.039/incompatible_polymorphics.txt
In database: regression
  aggregate: public.first_el_agg_f8(double precision)

I can have TestUpgradeXVersion.pm search for and remove offending
functions, if that's the right fix.

I note too that drongo is failing similarly, but its pg_upgrade output
directory is missing, so 4fff78f009 seems possibly shy of a load w.r.t.
MSVC. I will investigate.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#17)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

Andrew Dunstan <andrew@dunslane.net> writes:

On 2022-07-05 Tu 15:17, Tom Lane wrote:

Hmm ... could you first look into why 09878cdd4 broke it? I'd supposed
that that was just detecting situations we must already have dealt with
in order for the pg_upgrade test to work, but crake's not happy.

It's complaining about this:

andrew@emma:HEAD $ cat
./inst/REL9_6_STABLE-20220705T160820.039/incompatible_polymorphics.txt
In database: regression
  aggregate: public.first_el_agg_f8(double precision)

Thanks.

I can have TestUpgradeXVersion.pm search for and remove offending
functions, if that's the right fix.

I'm not sure. It seems like the new check must be too strict,
because it was only meant to detect cases that would cause a subsequent
dump/reload failure, and evidently this did not. I'll have to look
closer to figure out what to do. Anyway, it's off topic for this
thread ...

regards, tom lane

#19Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#12)
1 attachment(s)
Re: [UNVERIFIED SENDER] pg_upgrade can result in early wraparound on databases with high transaction load

On 5 Jul 2022, at 18:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability. But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

(reviving an old thread from the TODO)

Since we never got around to doing this we still refer to 8.4 as a possible
upgrade path in v14 and older.

There seems to be two alternatives here, either we bump the minimum version in
v14-v12 to 9.0 which is the technical limitation brought by 695b4a113ab, or we
follow the direction taken by e469f0aaf3c and set 9.2. e469f0aaf3c raised the
minimum supported version to 9.2 based on the complexity of compiling anything
older using a modern toolchain.

It can be argued that making a change we don't cover with testing is unwise,
but we clearly don't test the current code either since it's broken.

The attached takes the conservative approach of raising the minimum supported
version to 9.0 while leaving the code to handle 8.4 in place. While it can be
removed, the risk/reward tradeoff of gutting code in backbranches doesn't seem
appealing since the code will be unreachable with this check anyways.

Thoughts?

--
Daniel Gustafsson

Attachments:

0001-Refuse-upgrades-from-pre-9.0-clusters.patchapplication/octet-stream; name=0001-Refuse-upgrades-from-pre-9.0-clusters.patch; x-unix-mode=0644Download
From d0b747eb43980e4e639fd3844357e0ccbffbd705 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 16 May 2024 07:43:50 +0200
Subject: [PATCH] Refuse upgrades from pre-9.0 clusters

Commit 695b4a113ab added a dependency on retrieving oldestxid from
pg_control, which only exists in 9.0 and onwards, but the check for
8.4 as the oldest version was retained. Since there has been few if
any complaints of 8.4 upgrades not working, fix by setting 9.0 as
the oldest version supported rather than resurrecting 8.4 support.

Backpatch to all supported versions.

Discussion: https://postgr.es/m/1973418.1657040382@sss.pgh.pa.us
Backpatch-through: v12
---
 doc/src/sgml/ref/pgupgrade.sgml | 2 +-
 src/bin/pg_upgrade/check.c      | 9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index d401053edf..8add90e8ed 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -67,7 +67,7 @@ PostgreSQL documentation
  </para>
 
   <para>
-   pg_upgrade supports upgrades from 8.4.X and later to the current
+   pg_upgrade supports upgrades from 9.0.X and later to the current
    major release of <productname>PostgreSQL</productname>, including snapshot and beta releases.
   </para>
  </refsect1>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index dc060418e5..05a94087c5 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -316,8 +316,13 @@ check_cluster_versions(void)
 	 * upgrades
 	 */
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
-		pg_fatal("This utility can only upgrade from PostgreSQL version 8.4 and later.\n");
+	/*
+	 * The minimum version supported then this code shipped in a major version
+	 * was 8.4. This has since been raised to 9.0, but the support code for
+	 * dealing with 8.4 remains to avoid refactoring in a backbranch.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) < 900)
+		pg_fatal("This utility can only upgrade from PostgreSQL version 9.0 and later.\n");
 
 	/* Only current PG version is supported as a target */
 	if (GET_MAJOR_VERSION(new_cluster.major_version) != GET_MAJOR_VERSION(PG_VERSION_NUM))
-- 
2.39.3 (Apple Git-146)

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#19)
Re: [UNVERIFIED SENDER] pg_upgrade can result in early wraparound on databases with high transaction load

Daniel Gustafsson <daniel@yesql.se> writes:

On 5 Jul 2022, at 18:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability. But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

(reviving an old thread from the TODO)

Since we never got around to doing this we still refer to 8.4 as a possible
upgrade path in v14 and older.

Oh, yeah, that seems to have fallen through a crack.

The attached takes the conservative approach of raising the minimum supported
version to 9.0 while leaving the code to handle 8.4 in place. While it can be
removed, the risk/reward tradeoff of gutting code in backbranches doesn't seem
appealing since the code will be unreachable with this check anyways.

Yeah, it's not worth working harder than this. I do see one typo
in your comment: s/supported then/supported when/. LGTM otherwise.

regards, tom lane

#21Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#20)
Re: [UNVERIFIED SENDER] pg_upgrade can result in early wraparound on databases with high transaction load

On 16 May 2024, at 19:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it's not worth working harder than this. I do see one typo
in your comment: s/supported then/supported when/. LGTM otherwise.

Thanks for review, I've pushed this (with the fix from above) to 14 through 12.

--
Daniel Gustafsson