[PATCH] Clarify the behavior of the system when approaching XID wraparound
Hi hackers,
While playing with 64-bit XIDs [1]https://commitfest.postgresql.org/41/3594/ my attention was drawn by the
following statement in the docs [2]https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND:
"""
If these warnings are ignored, the system will shut down and refuse to
start any new transactions once there are fewer than three million
transactions left until wraparound.
"""
I decided to check this.
Unfortunately it can't be done easily e.g. by modifying
ShmemVariableCache->nextXid in gdb, because the system will PANIC with
something like "could not access status of transaction 12345".
Hopefully [3]https://commitfest.postgresql.org/41/3729/ will change the situation someday.
Meanwhile I choose the hard way. In one session I did:
```
CREATE TABLE phonebook(
"id" SERIAL PRIMARY KEY NOT NULL,
"name" NAME NOT NULL,
"phone" INT NOT NULL);
BEGIN;
INSERT INTO phonebook VALUES (1, 'Alex', 123);
-- don't commit!
```
Then I did the following:
```
echo "SELECT pg_current_xact_id();" > t.sql
pgbench -j 8 -c 8 -f t.sql -T 86400 eax
```
After 20-24 hours on the typical hardware (perhaps faster if only I
didn't forget to use `synchronous_commit = off`) pgbench will use up
the XID pool. The old tuples can't be frozen because the transaction
we created in the beginning is still in progress. So now we can
observe what actually happens when the system reaches xidStopLimit.
Firstly, the system doesn't shutdown as the documentation says.
Secondly, it executes new transactions just fine as long as these
transactions don't allocate new XIDs.
XIDs are allocated not for every transaction but rather lazily, when
needed (see backend_xid in pg_stat_activity). A transaction doesn't
need an assigned XID for checking the visibility of the tuples. Rather
it uses xmin horizon, and only when using an isolation level above
READ COMMITTED, see backend_xmin in pg_stat_activity. Assigning a xmin
horizon doesn't increase nextXid.
As a result, PostgreSQL can still execute read-only transactions even
after reaching xidStopLimit. Similarly to how it can do this on hot
standby replicas without having conflicts with the leader server.
Thirdly, if there was a transaction created before reaching
xidStopLimit, it will continue to execute after reaching xidStopLimit,
and it can be successfully committed.
All in all, the actual behavior is far from "system shutdown" and
"refusing to start any new transactions". It's closer to entering
read-only mode, similarly to what hot standbys allow to do.
The proposed patchset changes the documentation and the error messages
accordingly, making them less misleading. 0001 corrects the
documentation but doesn't touch the code. 0002 and 0003 correct the
messages shown when approaching xidWrapLimit and xidWarnLimit
accordingly.
Thoughts?
[1]: https://commitfest.postgresql.org/41/3594/
[2]: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
[3]: https://commitfest.postgresql.org/41/3729/
--
Best regards,
Aleksander Alekseev
Attachments:
v1-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patchapplication/octet-stream; name=v1-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patchDownload
From 83102eaa60f4f79c64307925a8531f52b8d81ae1 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:49:39 +0300
Subject: [PATCH v1 3/3] Fix the message in case of exceeding xidWarnLimit
Previously the message said that the user should execute VACCUM in order to
prevent the database shutdown. This gave a wrong impression that the entire
database may become unavailable.
This is not what happens though. Rather the system refuses to allocate
new XIDs. From the user perspective it looks like entering a read-only mode,
similarly to what the user sees on hot standby replicas.
This patch changes the message accordingly.
Author: Aleksander Alekseev
Reviewed-by: TODO FIXME
Discussion: TODO FIXME
---
doc/src/sgml/maintenance.sgml | 2 +-
src/backend/access/transam/varsup.c | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 48330605c7..d21675fac9 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -655,7 +655,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index fece6cf31b..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.0
v1-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patchapplication/octet-stream; name=v1-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patchDownload
From 3ebf73674345f6378adb5890338e31bf8afd63ee Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:34:18 +0300
Subject: [PATCH v1 2/3] Fix the message in case of approaching xidWrapLimit
Prior to this commit the error message said that the database doesn't accept
new commands when it approaches xidWrapLimit. This is not true. The commands
that don't allocate new XIDs can still be executed. E.g. read-only
transactions continue to work.
This commit clarifies this. Additionally it makes the error messages consistent
with the similar messages for MultiXactIds which interestingly were accurate.
Author: Aleksander Alekseev
Reviewed-by: TODO FIXME
Discussion: TODO FIXME
---
doc/src/sgml/maintenance.sgml | 2 +-
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2aa072ba89..48330605c7 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -666,7 +666,7 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
once there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..fece6cf31b 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
--
2.39.0
v1-0001-Correct-the-docs-about-preventing-XID-wraparound.patchapplication/octet-stream; name=v1-0001-Correct-the-docs-about-preventing-XID-wraparound.patchDownload
From b82db1c55349367cd74d931915bb064ccf0bda1e Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:23:30 +0300
Subject: [PATCH v1 1/3] Correct the docs about preventing XID wraparound
Prior to this commit the documentation was somewhat misleading. It stated that
the system will "shutdown" in order to prevent XID wraparound. This is not
what actually happens though.
Actually the system refuses to allocate new XIDs. XIDs are allocated lazily
and not needed for checking the visibility rules. Thus read-only transaction
will continues to execute. Additionally, read-write transactions that are
in progress and have an allocated XID will continue to execute and can be
committed.
Author: Aleksander Alekseev
Reviewed-by: TODO FIXME
Discussion: TODO FIXME
---
doc/src/sgml/maintenance.sgml | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 759ea5ac9c..2aa072ba89 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -662,24 +662,25 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
hint; but note that the <command>VACUUM</command> must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ If these warnings are ignored, the system will refuse to allocate new XIDs
+ once there are fewer than three million transactions left until wraparound:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition the system can still execute read-only transactions.
+ The active transactions will continue to execute and will be able to
+ commit.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
--
2.39.0
Hi hackers,
The proposed patchset changes the documentation and the error messages
accordingly, making them less misleading. 0001 corrects the
documentation but doesn't touch the code. 0002 and 0003 correct the
messages shown when approaching xidWrapLimit and xidWarnLimit
accordingly.
A colleague of mine, Oleksii Kliukin, pointed out that the
recommendation about running VACUUM in a single-user mode is also
outdated, as it was previously reported in [1]/messages/by-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA@mail.gmail.com. I didn't believe it at
first and decided to double-check:
```
=# select * from phonebook;
id | name | phone
----+---------+-------
1 | Alex | 123
5 | Charlie | 789
2 | Bob | 456
6 | Ololo | 789
(4 rows)
=# insert into phonebook values (7, 'Trololo', 987);
ERROR: database is not accepting commands to avoid wraparound data
loss in database "template1"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions,
or drop stale replication slots.
=# VACUUM FREEZE;
VACUUM
=# insert into phonebook values (7, 'Trololo', 987);
INSERT 0 1
=# SELECT current_setting('wal_level');
current_setting
-----------------
logical
```
Unfortunately the [1]/messages/by-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA@mail.gmail.com discussion went nowhere. So I figured it would
be appropriate to add corresponding changes to the proposed patchset
since it's relevant and is registered in the CF app already. PFA
patchset v2 which now also includes 0004.
[1]: /messages/by-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA@mail.gmail.com
--
Best regards,
Aleksander Alekseev
Attachments:
v2-0001-Correct-the-docs-about-preventing-XID-wraparound.patchapplication/octet-stream; name=v2-0001-Correct-the-docs-about-preventing-XID-wraparound.patchDownload
From 03e6f42077e25de71ceffeb839c09f4530bc1aa8 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:23:30 +0300
Subject: [PATCH v2 1/4] Correct the docs about preventing XID wraparound
Prior to this commit the documentation was somewhat misleading. It stated that
the system will "shutdown" in order to prevent XID wraparound. This is not
what actually happens though.
Actually the system refuses to allocate new XIDs. XIDs are allocated lazily
and not needed for checking the visibility rules. Thus read-only transaction
will continues to execute. Additionally, read-write transactions that are
in progress and have an allocated XID will continue to execute and can be
committed.
Author: Aleksander Alekseev
Reviewed-by: TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 759ea5ac9c..2aa072ba89 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -662,24 +662,25 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
hint; but note that the <command>VACUUM</command> must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ If these warnings are ignored, the system will refuse to allocate new XIDs
+ once there are fewer than three million transactions left until wraparound:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition the system can still execute read-only transactions.
+ The active transactions will continue to execute and will be able to
+ commit.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
--
2.39.0
v2-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patchapplication/octet-stream; name=v2-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patchDownload
From 5ce785a5617aa9ab38d4a414f1a927d8de292abf Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:49:39 +0300
Subject: [PATCH v2 3/4] Fix the message in case of exceeding xidWarnLimit
Previously the message said that the user should execute VACCUM in order to
prevent the database shutdown. This gave a wrong impression that the entire
database may become unavailable.
This is not what happens though. Rather the system refuses to allocate
new XIDs. From the user perspective it looks like entering a read-only mode,
similarly to what the user sees on hot standby replicas.
This patch changes the message accordingly.
Author: Aleksander Alekseev
Reviewed-by: TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 2 +-
src/backend/access/transam/varsup.c | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 48330605c7..d21675fac9 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -655,7 +655,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index fece6cf31b..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.0
v2-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchapplication/octet-stream; name=v2-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchDownload
From 7656a445b24914d402e00465de8491453c83c89d Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 15:37:12 +0300
Subject: [PATCH v2 4/4] Don't recommend running VACUUM in a single-user mode
when reaching xidStopLimit
This recommendation is outdated for some time now. A regular VACUUM or
VACUUM FREEZE works just fine.
Author: Aleksander Alekseev
Reported-by: Hannu Krosing
Reviewed-by: TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 8 ++------
src/backend/access/transam/varsup.c | 8 +++-----
2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index d21675fac9..3b78098232 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -667,7 +667,7 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM or VACUUM FREEZE that database.
</programlisting>
In this condition the system can still execute read-only transactions.
@@ -676,11 +676,7 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ required <command>VACUUM</command> commands.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 485d8ebf81..a79144d445 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -86,9 +86,7 @@ GetNewTransactionId(bool isSubXact)
*
* If we're past xidVacLimit, start trying to force autovacuum cycles.
* If we're past xidWarnLimit, start issuing warnings.
- * If we're past xidStopLimit, refuse to execute transactions, unless
- * we are running in single-user mode (which gives an escape hatch
- * to the DBA who somehow got past the earlier defenses).
+ * If we're past xidStopLimit, refuse to allocate new XIDs.
*
* Note that this coding also appears in GetNewMultiXactId.
*----------
@@ -128,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM or VACUUM FREEZE that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM or VACUUM FREEZE that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.39.0
v2-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patchapplication/octet-stream; name=v2-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patchDownload
From 8de345838a987de5e0ac74c7bd70dc0d746a4574 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:34:18 +0300
Subject: [PATCH v2 2/4] Fix the message in case of approaching xidWrapLimit
Prior to this commit the error message said that the database doesn't accept
new commands when it approaches xidWrapLimit. This is not true. The commands
that don't allocate new XIDs can still be executed. E.g. read-only
transactions continue to work.
This commit clarifies this. Additionally it makes the error messages consistent
with the similar messages for MultiXactIds which interestingly were accurate.
Author: Aleksander Alekseev
Reviewed-by: TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 2 +-
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2aa072ba89..48330605c7 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -666,7 +666,7 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
once there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..fece6cf31b 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
--
2.39.0
On Mon, Jan 16, 2023 at 03:50:57PM +0300, Aleksander Alekseev wrote:
Hi hackers,
The proposed patchset changes the documentation and the error messages
accordingly, making them less misleading. 0001 corrects the
documentation but doesn't touch the code. 0002 and 0003 correct the
messages shown when approaching xidWrapLimit and xidWarnLimit
accordingly.A colleague of mine, Oleksii Kliukin, pointed out that the
recommendation about running VACUUM in a single-user mode is also
outdated, as it was previously reported in [1]. I didn't believe it at
first and decided to double-check:
and again at:
/messages/by-id/CA+TgmoYPfofQmRtUan=A3aWE9wFsJaOFr+W_ys2pPkNPr-2FZw@mail.gmail.com
Unfortunately the [1] discussion went nowhere.
likewise...
So I figured it would be appropriate to add corresponding changes to
the proposed patchset since it's relevant and is registered in the CF
app already. PFA patchset v2 which now also includes 0004.[1]:
/messages/by-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA@mail.gmail.com
I suggest to resend this with a title like the 2021 thread [1] (I was
unable to find this just now when I looked)
| doc: stop telling users to "vacuum that database in single-user mode"
And copy the participants of the previous two iterations of this thread.
--
Justin
Thanks for picking up this badly-needed topic again! I was irresponsible
last year and let it fall off my radar, but I'm looking at the patches, as
well as revisiting discussions from the last four (!?) years that didn't
lead to action.
0001:
+ In this condition the system can still execute read-only transactions.
+ The active transactions will continue to execute and will be able to
+ commit.
This is ambiguous. I'd first say that any transactions already started can
continue, and then say that only new read-only transactions can be started.
0004:
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM or VACUUM FREEZE that database.
VACUUM FREEZE is worse and should not be mentioned, since it does
unnecessary work. Emergency vacuum is not school -- you don't get extra
credit for doing unnecessary work.
Also, we may consider adding a boxed NOTE warning specifically against
single-user mode, especially if this recommendation will change in at least
some minor releases so people may not hear about it. See also [1]/messages/by-id/CA+Tgmoadjx+r8_gGbbnNifL6vEyjZntiQRPzyixrUihvtZ5jdQ@mail.gmail.com.
- * If we're past xidStopLimit, refuse to execute transactions, unless
- * we are running in single-user mode (which gives an escape hatch
- * to the DBA who somehow got past the earlier defenses).
+ * If we're past xidStopLimit, refuse to allocate new XIDs.
This patch doesn't completely get rid of the need for single-user mode, so
it should keep all information about it. If a DBA wanted to e.g. drop or
truncate a table to save vacuum time, it is still possible to do it in
single-user mode, so the escape hatch is still useful.
In swapping this topic back in my head, I also saw [2]/messages/by-id/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=D+z7cOsw@mail.gmail.com where Robert
suggested
"that old prepared transactions and stale replication
slots should be emphasized more prominently. Maybe something like:
HINT: Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide
VACUUM."
That sounds like a good direction to me. There is more we could do here to
make the message more specific [3]/messages/by-id/20190504023015.5mgpbl27tld4irw5@alap3.anarazel.de[4]/messages/by-id/20220204013539.qdegpqzvayq3d4y2@alap3.anarazel.de[5]/messages/by-id/20220220045757.GA3733812@rfd.leadboat.com, but the patches here are in the
right direction.
Note for possible backpatching: It seems straightforward to go back to
PG14, which has the failsafe, but we should have better testing in place
first. There is a patch in this CF to make it easier to get close to
wraparound, so I'll look at what it does as well.
[1]: /messages/by-id/CA+Tgmoadjx+r8_gGbbnNifL6vEyjZntiQRPzyixrUihvtZ5jdQ@mail.gmail.com
/messages/by-id/CA+Tgmoadjx+r8_gGbbnNifL6vEyjZntiQRPzyixrUihvtZ5jdQ@mail.gmail.com
[2]: /messages/by-id/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=D+z7cOsw@mail.gmail.com
/messages/by-id/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=D+z7cOsw@mail.gmail.com
[3]: /messages/by-id/20190504023015.5mgpbl27tld4irw5@alap3.anarazel.de
/messages/by-id/20190504023015.5mgpbl27tld4irw5@alap3.anarazel.de
[4]: /messages/by-id/20220204013539.qdegpqzvayq3d4y2@alap3.anarazel.de
/messages/by-id/20220204013539.qdegpqzvayq3d4y2@alap3.anarazel.de
[5]: /messages/by-id/20220220045757.GA3733812@rfd.leadboat.com
/messages/by-id/20220220045757.GA3733812@rfd.leadboat.com
--
John Naylor
EDB: http://www.enterprisedb.com
Hi John,
Thanks for picking up this badly-needed topic again!
Many thanks for the review!
0001:
+ In this condition the system can still execute read-only transactions. + The active transactions will continue to execute and will be able to + commit.This is ambiguous. I'd first say that any transactions already started can continue, and then say that only new read-only transactions can be started.
Fixed.
0004:
-HINT: Stop the postmaster and vacuum that database in single-user mode. +HINT: VACUUM or VACUUM FREEZE that database.VACUUM FREEZE is worse and should not be mentioned, since it does unnecessary work. Emergency vacuum is not school -- you don't get extra credit for doing unnecessary work.
Fixed.
Also, we may consider adding a boxed NOTE warning specifically against single-user mode, especially if this recommendation will change in at least some minor releases so people may not hear about it. See also [1].
Done.
- * If we're past xidStopLimit, refuse to execute transactions, unless - * we are running in single-user mode (which gives an escape hatch - * to the DBA who somehow got past the earlier defenses). + * If we're past xidStopLimit, refuse to allocate new XIDs.This patch doesn't completely get rid of the need for single-user mode, so it should keep all information about it. If a DBA wanted to e.g. drop or truncate a table to save vacuum time, it is still possible to do it in single-user mode, so the escape hatch is still useful.
Fixed.
In swapping this topic back in my head, I also saw [2] where Robert suggested
"that old prepared transactions and stale replication
slots should be emphasized more prominently. Maybe something like:HINT: Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide VACUUM."
It looks like the hint regarding replication slots was added at some
point. Currently we have:
```
errhint( [...]
"You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.")));
```
So I choose to keep it as is for now. Please let me know if you think
we should also add a suggestion to kill long-running sessions, etc.
--
Best regards,
Aleksander Alekseev
Attachments:
v3-0001-Correct-the-docs-about-preventing-XID-wraparound.patchapplication/octet-stream; name=v3-0001-Correct-the-docs-about-preventing-XID-wraparound.patchDownload
From 7f534bdb15e89ca3ed88712c457cee7d69892da3 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:23:30 +0300
Subject: [PATCH v3 1/4] Correct the docs about preventing XID wraparound
Prior to this commit the documentation was somewhat misleading. It stated that
the system will "shutdown" in order to prevent XID wraparound. This is not
what actually happens though.
Actually the system refuses to allocate new XIDs. XIDs are allocated lazily
and not needed for checking the visibility rules. Thus read-only transaction
will continues to execute. Additionally, read-write transactions that are
in progress and have an allocated XID will continue to execute and can be
committed.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..92d99a8f8c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -663,24 +663,24 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
hint; but note that the <command>VACUUM</command> must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ If these warnings are ignored, the system will refuse to allocate new XIDs
+ once there are fewer than three million transactions left until wraparound:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already started can continue.
+ Only new read-only transactions can be started.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
--
2.39.2
v3-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patchapplication/octet-stream; name=v3-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patchDownload
From 669709ad943149b39ec9db06e4603dadf2dd146c Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:34:18 +0300
Subject: [PATCH v3 2/4] Fix the message in case of approaching xidWrapLimit
Prior to this commit the error message said that the database doesn't accept
new commands when it approaches xidWrapLimit. This is not true. The commands
that don't allocate new XIDs can still be executed. E.g. read-only
transactions continue to work.
This commit clarifies this. Additionally it makes the error messages consistent
with the similar messages for MultiXactIds which interestingly were accurate.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 2 +-
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 92d99a8f8c..b0336b12e5 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -667,7 +667,7 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
once there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..fece6cf31b 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
--
2.39.2
v3-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patchapplication/octet-stream; name=v3-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patchDownload
From 23f01af9c2e807f4d4bb175a25ef9273f38388b9 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:49:39 +0300
Subject: [PATCH v3 3/4] Fix the message in case of exceeding xidWarnLimit
Previously the message said that the user should execute VACCUM in order to
prevent the database shutdown. This gave a wrong impression that the entire
database may become unavailable.
This is not what happens though. Rather the system refuses to allocate
new XIDs. From the user perspective it looks like entering a read-only mode,
similarly to what the user sees on hot standby replicas.
This patch changes the message accordingly.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 2 +-
src/backend/access/transam/varsup.c | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index b0336b12e5..2a902e422c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index fece6cf31b..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.2
v3-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchapplication/octet-stream; name=v3-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchDownload
From 66465d20dcd68b168f4510b64edc48acf7229dfd Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 15:37:12 +0300
Subject: [PATCH v3 4/4] Don't recommend running VACUUM in a single-user mode
when reaching xidStopLimit
This recommendation is outdated for some time now. A regular VACUUM
works just fine. Additionally add an explicit note against using a single-user
mode for people who may potentially miss this change.
Author: Aleksander Alekseev
Reported-by: Hannu Krosing
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 15 +++++++++------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2a902e422c..12bc7af01f 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -668,7 +668,7 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM that database.
</programlisting>
In this condition any transactions already started can continue.
@@ -676,13 +676,16 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ required <command>VACUUM</command> commands.
</para>
+ <note>
+ <para>
+ Previously it was required to stop the postmaster and VACUUM the database
+ in a single-user mode. There is no need to use a single-user mode anymore.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 485d8ebf81..ef1f211d32 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.39.2
On Tue, Mar 21, 2023 at 6:44 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:
Okay, the changes look good. To go further, I think we need to combine into
two patches, one with 0001-0003 and one with 0004:
1. Correct false statements about "shutdown" etc. This should contain
changes that can safely be patched all the way to v11.
2. Change bad advice (single-user mode) into good advice. We can target
head first, and then try to adopt as far back as we safely can (and test).
(...and future work, so not part of the CF here) 3. Tell the user what
caused the problem, instead of saying "go figure it out yourself".
In swapping this topic back in my head, I also saw [2] where Robert
suggested
"that old prepared transactions and stale replication
slots should be emphasized more prominently. Maybe something like:HINT: Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide
VACUUM."
It looks like the hint regarding replication slots was added at some
point. Currently we have:```
errhint( [...]
"You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.")));
```
Yes, the exact same text as it appeared in the [2] thread above, which
prompted Robert's comment I quoted. I take the point to mean: All of these
things need to be taken care of *first*, before vacuuming, so the hint
should order things so that it is clear.
Please let me know if you think
we should also add a suggestion to kill long-running sessions, etc.
+1 for also adding that.
- errmsg("database is not accepting commands to avoid wraparound data loss
in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to
avoid wraparound data loss in database \"%s\"",
I'm not quite on board with the new message, but welcome additional
opinions. For one, it's a bit longer and now ambiguous. I also bet that
"generate XIDs" doesn't really communicate anything useful. The people who
understand exactly what that means, and what the consequences are, are
unlikely to let the system get near wraparound in the first place, and
might even know enough to ignore the hint.
I'm thinking we might need to convey something about "writes". While it's
less technically correct, I imagine it's more useful. Remember, many users
have it drilled in their heads that they need to drop immediately to
single-user mode. I'd like to give some idea of what they can and cannot do.
+ Previously it was required to stop the postmaster and VACUUM the
database
+ in a single-user mode. There is no need to use a single-user mode
anymore.
I think we need to go further and actively warn against it: It's slow,
impossible to monitor, disables replication and disables safeguards against
wraparound. (Other bad things too, but these are easily understandable for
users)
Maybe mention also that it's main use in wraparound situations is for a way
to perform DROPs and TRUNCATEs if that would help speed up resolution.
I propose for discussion that 0004 should show in the docs all the queries
for finding prepared xacts, repl slots etc. If we ever show the info at
runtime, we can dispense with the queries, but there seems to be no urgency
for that...
--
John Naylor
EDB: http://www.enterprisedb.com
Hi John,
Many thanks for all the great feedback!
Okay, the changes look good. To go further, I think we need to combine into two patches, one with 0001-0003 and one with 0004:
1. Correct false statements about "shutdown" etc. This should contain changes that can safely be patched all the way to v11.
2. Change bad advice (single-user mode) into good advice. We can target head first, and then try to adopt as far back as we safely can (and test).
Done.
In swapping this topic back in my head, I also saw [2] where Robert suggested
"that old prepared transactions and stale replication
slots should be emphasized more prominently. Maybe something like:HINT: Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide VACUUM."It looks like the hint regarding replication slots was added at some
point. Currently we have:```
errhint( [...]
"You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.")));
```Yes, the exact same text as it appeared in the [2] thread above, which prompted Robert's comment I quoted. I take the point to mean: All of these things need to be taken care of *first*, before vacuuming, so the hint should order things so that it is clear.
Please let me know if you think
we should also add a suggestion to kill long-running sessions, etc.+1 for also adding that.
OK, done. I included this change as a separate patch. It can be
squashed with another one if necessary.
While on it, I noticed that multixact.c still talks about a
"shutdown". I made corresponding changes in 0001.
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"", + errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",I'm not quite on board with the new message, but welcome additional opinions. For one, it's a bit longer and now ambiguous. I also bet that "generate XIDs" doesn't really communicate anything useful. The people who understand exactly what that means, and what the consequences are, are unlikely to let the system get near wraparound in the first place, and might even know enough to ignore the hint.
I'm thinking we might need to convey something about "writes". While it's less technically correct, I imagine it's more useful. Remember, many users have it drilled in their heads that they need to drop immediately to single-user mode. I'd like to give some idea of what they can and cannot do.
This particular wording was chosen for consistency with multixact.c:
```
errmsg("database is not accepting commands that generate new
MultiXactIds to avoid wraparound data loss in database \"%s\"",
```
The idea of using "writes" is sort of OK, but note that the same
message will appear for a query like:
```
SELECT pg_current_xact_id();
```
... which doesn't do writes. The message will be misleading in this case.
On top of that, although a PostgreSQL user may not be aware of
MultiXactIds, arguably there are many users that are aware of XIDs.
Not to mention the fact that XIDs are well documented.
I didn't make this change in v4 since it seems to be controversial and
probably not the highest priority at the moment. I suggest we discuss
it separately.
I propose for discussion that 0004 should show in the docs all the queries for finding prepared xacts, repl slots etc. If we ever show the info at runtime, we can dispense with the queries, but there seems to be no urgency for that...
Good idea.
+ Previously it was required to stop the postmaster and VACUUM the database + in a single-user mode. There is no need to use a single-user mode anymore.I think we need to go further and actively warn against it: It's slow, impossible to monitor, disables replication and disables safeguards against wraparound. (Other bad things too, but these are easily understandable for users)
Maybe mention also that it's main use in wraparound situations is for a way to perform DROPs and TRUNCATEs if that would help speed up resolution.
Fixed.
--
Best regards,
Aleksander Alekseev
Attachments:
v4-0001-Correct-the-docs-and-messages-about-preventing-XI.patchapplication/octet-stream; name=v4-0001-Correct-the-docs-and-messages-about-preventing-XI.patchDownload
From 9c3b184331c734c0178bd0e7e8e507477023dd54 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:23:30 +0300
Subject: [PATCH v4 1/3] Correct the docs and messages about preventing XID
wraparound
Prior to this commit the documentation was somewhat misleading. It stated that
the system will "shutdown" in order to prevent XID wraparound. This is not
what actually happens though.
Actually the system refuses to allocate new XIDs. XIDs are allocated lazily
and not needed for checking the visibility rules. Thus read-only transaction
will continues to execute. Additionally, read-write transactions that are
in progress and have an allocated XID will continue to execute and can be
committed.
Additionally, fix the messages shown when approaching xidWrapLimit,
xidWarnLimit and multiWarnLimit accordingly.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 22 +++++++++++-----------
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 12 ++++++------
3 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..2a902e422c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,31 +656,31 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
hint; but note that the <command>VACUUM</command> must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ If these warnings are ignored, the system will refuse to allocate new XIDs
+ once there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already started can continue.
+ Only new read-only transactions can be started.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ff..1dd29c02cc 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.2
v4-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchapplication/octet-stream; name=v4-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchDownload
From dce05503162c274b5496cde5b2e7cefcb78452ba Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 15:37:12 +0300
Subject: [PATCH v4 2/3] Don't recommend running VACUUM in a single-user mode
when reaching xidStopLimit
This recommendation is outdated for some time now. A regular VACUUM
works just fine. Additionally add an explicit note against using a single-user
mode for people who may potentially miss this change.
Author: Aleksander Alekseev
Reported-by: Hannu Krosing
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 20 ++++++++++++++------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2a902e422c..f0d913c94f 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -668,7 +668,7 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM that database.
</programlisting>
In this condition any transactions already started can continue.
@@ -676,13 +676,21 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ required <command>VACUUM</command> commands.
</para>
+ <note>
+ <para>
+ Previously it was required to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. There is no
+ need to use a single-user mode anymore unless the resolution needs to be
+ speeded up by performing <command>TRUNCATE</command> or
+ <command>DROP</command>. Except for this scenario it's strongly advised
+ against using a single-user mode: it requires downtime, can't be monitored,
+ disables replication, disables safeguards against wraparound, etc.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 485d8ebf81..ef1f211d32 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.39.2
v4-0003-Modify-the-hints-about-preventing-XID-wraparound.patchapplication/octet-stream; name=v4-0003-Modify-the-hints-about-preventing-XID-wraparound.patchDownload
From 1f804c86796be37ce6154b9235ae64e7ed8f0196 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 3 Apr 2023 14:00:38 +0300
Subject: [PATCH v4 3/3] Modify the hints about preventing XID wraparound
The new hint messages provide a comprehensive list of actionable items that
help preventing XID wraparound. Previously the list was incomplete.
Per suggestion by Robert Haas.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
src/backend/access/transam/multixact.c | 12 ++++++------
src/backend/access/transam/varsup.c | 12 ++++++------
src/backend/commands/vacuum.c | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1dd29c02cc..5f922f3afd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1023,14 +1023,14 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
/*
@@ -1054,7 +1054,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
oldest_datname,
multiWrapLimit - result),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(WARNING,
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
@@ -1063,7 +1063,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
oldest_datoid,
multiWrapLimit - result),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
/* Re-acquire lock and start over */
@@ -2336,7 +2336,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
oldest_datname,
multiWrapLimit - curMulti),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(WARNING,
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
@@ -2345,7 +2345,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
oldest_datoid,
multiWrapLimit - curMulti),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index ef1f211d32..186d394df2 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -129,14 +129,14 @@ GetNewTransactionId(bool isSubXact)
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("VACUUM that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("VACUUM that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
{
@@ -149,14 +149,14 @@ GetNewTransactionId(bool isSubXact)
oldest_datname,
xidWrapLimit - xid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
/* Re-acquire lock and start over */
@@ -464,14 +464,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
oldest_datname,
xidWrapLimit - curXid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 1c3437336d..e9162f0534 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1047,12 +1047,12 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
ereport(WARNING,
(errmsg("cutoff for removing and freezing tuples is far in the past"),
errhint("Close open transactions soon to avoid wraparound problems.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
if (MultiXactIdPrecedes(cutoffs->OldestMxact, safeOldestMxact))
ereport(WARNING,
(errmsg("cutoff for freezing multixacts is far in the past"),
errhint("Close open transactions soon to avoid wraparound problems.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
/*
* Determine the minimum freeze age to use: as specified by the caller, or
--
2.39.2
On Mon, Apr 3, 2023 at 7:33 PM Aleksander Alekseev <aleksander@timescale.com>
wrote:
Yes, the exact same text as it appeared in the [2] thread above, which
prompted Robert's comment I quoted. I take the point to mean: All of these
things need to be taken care of *first*, before vacuuming, so the hint
should order things so that it is clear.
Please let me know if you think
we should also add a suggestion to kill long-running sessions, etc.+1 for also adding that.
OK, done. I included this change as a separate patch. It can be
squashed with another one if necessary.
Okay, great. For v4-0003:
Each hint mentions vacuum twice, because it's adding the vacuum message at
the end, but not removing it from the beginning. The vacuum string should
be on its own line, since we will have to modify that for back branches
(skip indexes and truncation).
I'm also having second thoughts about "Ensure that autovacuum is
progressing" in the hint. That might be better in the docs, if we decide to
go ahead with adding a specific checklist there.
In vacuum.c:
errhint("Close open transactions soon to avoid wraparound problems.\n"
- "You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions,
drop stale replication slots, or kill long-running sessions. Ensure that
autovacuum is progressing, or run a manual database-wide VACUUM.")));
This appears in vacuum_get_cutoffs(), which is called by vacuum and
cluster, and the open transactions were already mentioned, so this is not
the place for this change.
This particular wording was chosen for consistency with multixact.c:
```
errmsg("database is not accepting commands that generate new
MultiXactIds to avoid wraparound data loss in database \"%s\"",
```
Okay, I didn't look into that -- seems like a good precedent.
v4-0002:
- errhint("Stop the postmaster and vacuum that database in single-user
mode.\n"
+ errhint("VACUUM that database.\n"
Further in the spirit of consistency, the mulixact path already has
"Execute a database-wide VACUUM in that database.\n", and that seems like
better wording.
--
John Naylor
EDB: http://www.enterprisedb.com
Hi!
I've looked into the patches v4.
For 0001:
I think long "not accepting commands that generate" is equivalent to
more concise "can't generate".
For 0003:
I think double mentioning of Vacuum at each errhist i.e.: "Execute a
database-wide VACUUM in that database" and "...or run a manual
database-wide VACUUM." are redundant. The advice "Ensure that
autovacuum is progressing,..." is also not needed after advice to
"Execute a database-wide VACUUM in that database".
For all:
In a errhint's list what _might_ be done I think AND is a little bit
better that OR as the word _might_ means that each of the proposals in
the list is a probable, not a sure one.
The proposed changes are in patchset v5.
Kind regards,
Pavel Borisov,
Supabase.
Attachments:
v5-0002-This-recommendation-is-outdated-for-some-time-now.patchapplication/octet-stream; name=v5-0002-This-recommendation-is-outdated-for-some-time-now.patchDownload
From 4633ec986fce07c22ca094212c86ca16d5c9c887 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Tue, 4 Apr 2023 13:26:14 +0400
Subject: [PATCH v5 2/3] This recommendation is outdated for some time now. A
regular VACUUM works just fine. Additionally add an explicit note against
using a single-user mode for people who may potentially miss this change.
Author: Aleksander Alekseev
Reported-by: Hannu Krosing
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 20 ++++++++++++++------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index b7254344ee0..66456b5e462 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -668,7 +668,7 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha
<programlisting>
ERROR: database can't generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM that database.
</programlisting>
In this condition any transactions already started can continue.
@@ -676,13 +676,21 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ required <command>VACUUM</command> commands.
</para>
+ <note>
+ <para>
+ Previously it was required to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. There is no
+ need to use a single-user mode anymore unless the resolution needs to be
+ speeded up by performing <command>TRUNCATE</command> or
+ <command>DROP</command>. Except for this scenario it's strongly advised
+ against using a single-user mode: it requires downtime, can't be monitored,
+ disables replication, disables safeguards against wraparound, etc.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index bd27576bd58..f039c98cd09 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database can't generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM that database.\n"
"You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database can't generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("VACUUM that database.\n"
"You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.37.1 (Apple Git-137.1)
v5-0001-Correct-the-docs-and-messages-about-preventing-XI.patchapplication/octet-stream; name=v5-0001-Correct-the-docs-and-messages-about-preventing-XI.patchDownload
From f04e0c576c1415f04fc32d8af792ccfc28527869 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:23:30 +0300
Subject: [PATCH v5 1/3] Correct the docs and messages about preventing XID
wraparound
Prior to this commit the documentation was somewhat misleading. It stated that
the system will "shutdown" in order to prevent XID wraparound. This is not
what actually happens though.
Actually the system refuses to allocate new XIDs. XIDs are allocated lazily
and not needed for checking the visibility rules. Thus read-only transaction
will continues to execute. Additionally, read-write transactions that are
in progress and have an allocated XID will continue to execute and can be
committed.
Additionally, fix the messages shown when approaching xidWrapLimit,
xidWarnLimit and multiWarnLimit accordingly.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 22 +++++++++++-----------
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 24 ++++++++++++------------
3 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8a..b7254344ee0 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,31 +656,31 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
hint; but note that the <command>VACUUM</command> must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ If these warnings are ignored, the system will refuse to allocate new XIDs
+ once there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database can't generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already started can continue.
+ Only new read-only transactions can be started.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ffa..1dd29c02cc5 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e8..bd27576bd58 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,17 +126,17 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database can't generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database can't generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
{
@@ -148,15 +148,15 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
+ "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
+ "You might also need to commit or roll back old prepared transactions, and drop stale replication slots.")));
}
/* Re-acquire lock and start over */
@@ -463,15 +463,15 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
+ "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
+ "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
}
}
--
2.37.1 (Apple Git-137.1)
v5-0003-Modify-the-hints-about-preventing-XID-wraparound.patchapplication/octet-stream; name=v5-0003-Modify-the-hints-about-preventing-XID-wraparound.patchDownload
From 2c450d9a21761fcbd3d54b9e88967c901e779c0d Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 3 Apr 2023 14:00:38 +0300
Subject: [PATCH v5 3/3] Modify the hints about preventing XID wraparound
The new hint messages provide a comprehensive list of actionable items that
help preventing XID wraparound. Previously the list was incomplete.
Per suggestion by Robert Haas.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
src/backend/access/transam/multixact.c | 12 ++++++------
src/backend/access/transam/varsup.c | 12 ++++++------
src/backend/commands/vacuum.c | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1dd29c02cc5..5f922f3afd5 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1023,14 +1023,14 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
/*
@@ -1054,7 +1054,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
oldest_datname,
multiWrapLimit - result),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(WARNING,
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
@@ -1063,7 +1063,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
oldest_datoid,
multiWrapLimit - result),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
/* Re-acquire lock and start over */
@@ -2336,7 +2336,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
oldest_datname,
multiWrapLimit - curMulti),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
else
ereport(WARNING,
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
@@ -2345,7 +2345,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
oldest_datoid,
multiWrapLimit - curMulti),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index f039c98cd09..52e456b70c7 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -129,14 +129,14 @@ GetNewTransactionId(bool isSubXact)
errmsg("database can't generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("VACUUM that database.\n"
- "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots and kill long-running sessions.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database can't generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("VACUUM that database.\n"
- "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots and kill long-running sessions.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
{
@@ -149,14 +149,14 @@ GetNewTransactionId(bool isSubXact)
oldest_datname,
xidWrapLimit - xid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots and kill long-running sessions.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, and drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots and kill long-running sessions.")));
}
/* Re-acquire lock and start over */
@@ -464,14 +464,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
oldest_datname,
xidWrapLimit - curXid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots and kill long-running sessions.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions and drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots and kill long-running sessions.")));
}
}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index da85330ef40..e8f9fa052a6 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1049,12 +1049,12 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
ereport(WARNING,
(errmsg("cutoff for removing and freezing tuples is far in the past"),
errhint("Close open transactions soon to avoid wraparound problems.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
if (MultiXactIdPrecedes(cutoffs->OldestMxact, safeOldestMxact))
ereport(WARNING,
(errmsg("cutoff for freezing multixacts is far in the past"),
errhint("Close open transactions soon to avoid wraparound problems.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));
/*
* Determine the minimum freeze age to use: as specified by the caller, or
--
2.37.1 (Apple Git-137.1)
Hi,
The proposed changes are in patchset v5.
Pavel, John, thanks for your feedback.
I've looked into the patches v4.
For 0001:
I think long "not accepting commands that generate" is equivalent to
more concise "can't generate".
Frankly I don't think this is a good change for this particular CF
entry and it violates the consistency with multixact.c. Additionally
the new message is not accurate. The DBMS _can_ generate new XIDs,
quite a few of them actually. It merely refuses to do so.
For all:
In a errhint's list what _might_ be done I think AND is a little bit
better that OR as the word _might_ means that each of the proposals in
the list is a probable, not a sure one.
Well, that's debatable... IMO "or" makes a bit more sense since the
user knows better whether he/she needs to kill a long-running
transaction, or run VACUUM, or maybe do both. "And" would imply that
the user needs to do all of it, which is not necessarily true. Since
originally it was "or" I suggest we keep it as is for now.
All in all I would prefer keeping the focus on the particular problem
the patch tries to address.
For 0003:
I think double mentioning of Vacuum at each errhist i.e.: "Execute a
database-wide VACUUM in that database" and "...or run a manual
database-wide VACUUM." are redundant. The advice "Ensure that
autovacuum is progressing,..." is also not needed after advice to
"Execute a database-wide VACUUM in that database".
[...]
Okay, great. For v4-0003:
Each hint mentions vacuum twice, because it's adding the vacuum message at the end, but not removing it from the beginning. The vacuum string should be on its own line, since we will have to modify that for back branches (skip indexes and truncation).
My bad. Fixed.
I'm also having second thoughts about "Ensure that autovacuum is progressing" in the hint. That might be better in the docs, if we decide to go ahead with adding a specific checklist there.
OK, done.
In vacuum.c:
errhint("Close open transactions soon to avoid wraparound problems.\n" - "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); + "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));This appears in vacuum_get_cutoffs(), which is called by vacuum and cluster, and the open transactions were already mentioned, so this is not the place for this change.
Fixed.
v4-0002:
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n" + errhint("VACUUM that database.\n"Further in the spirit of consistency, the mulixact path already has "Execute a database-wide VACUUM in that database.\n", and that seems like better wording.
Agree. Fixed.
--
Best regards,
Aleksander Alekseev
Attachments:
v6-0001-Correct-the-docs-and-messages-about-preventing-XI.patchapplication/octet-stream; name=v6-0001-Correct-the-docs-and-messages-about-preventing-XI.patchDownload
From ebb5b0278d82bcfb566703d78d3e6f08c7b70c02 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 12:23:30 +0300
Subject: [PATCH v6 1/3] Correct the docs and messages about preventing XID
wraparound
Prior to this commit the documentation was somewhat misleading. It stated that
the system will "shutdown" in order to prevent XID wraparound. This is not
what actually happens though.
Actually the system refuses to allocate new XIDs. XIDs are allocated lazily
and not needed for checking the visibility rules. Thus read-only transaction
will continues to execute. Additionally, read-write transactions that are
in progress and have an allocated XID will continue to execute and can be
committed.
Additionally, fix the messages shown when approaching xidWrapLimit,
xidWarnLimit and multiWarnLimit accordingly.
Author: Aleksander Alekseev
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 22 +++++++++++-----------
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 12 ++++++------
3 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..2a902e422c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,31 +656,31 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
hint; but note that the <command>VACUUM</command> must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ If these warnings are ignored, the system will refuse to allocate new XIDs
+ once there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already started can continue.
+ Only new read-only transactions can be started.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ff..1dd29c02cc 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.40.0
v6-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchapplication/octet-stream; name=v6-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patchDownload
From 7ff56f0b9b68541a2403175a29f2a179d4ecb3c8 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 16 Jan 2023 15:37:12 +0300
Subject: [PATCH v6 2/3] Don't recommend running VACUUM in a single-user mode
when reaching xidStopLimit
This recommendation is outdated for some time now. A regular VACUUM
works just fine. Additionally add an explicit note against using a single-user
mode for people who may potentially miss this change.
Author: Aleksander Alekseev
Reported-by: Hannu Krosing
Reviewed-by: John Naylor, Pavel Borisov
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 20 ++++++++++++++------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2a902e422c..e6b24f3258 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -668,7 +668,7 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: Execute a database-wide VACUUM in that database.
</programlisting>
In this condition any transactions already started can continue.
@@ -676,13 +676,21 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ required <command>VACUUM</command> commands.
</para>
+ <note>
+ <para>
+ Previously it was required to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. There is no
+ need to use a single-user mode anymore unless the resolution needs to be
+ speeded up by performing <command>TRUNCATE</command> or
+ <command>DROP</command>. Except for this scenario it's strongly advised
+ against using a single-user mode: it requires downtime, can't be monitored,
+ disables replication, disables safeguards against wraparound, etc.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 485d8ebf81..667da0fd64 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.40.0
v6-0003-Modify-the-hints-about-preventing-XID-wraparound.patchapplication/octet-stream; name=v6-0003-Modify-the-hints-about-preventing-XID-wraparound.patchDownload
From 228d4d10bf555f9e40b86fbce805a7bf9b6e9b8b Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 3 Apr 2023 14:00:38 +0300
Subject: [PATCH v6 3/3] Modify the hints about preventing XID wraparound
Per suggestion by Robert Haas.
Author: Aleksander Alekseev
Reviewed-by: John Naylor, Pavel Borisov
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
src/backend/access/transam/multixact.c | 12 ++++++------
src/backend/access/transam/varsup.c | 12 ++++++------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1dd29c02cc..5321719d92 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1023,14 +1023,14 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
}
/*
@@ -1054,7 +1054,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
oldest_datname,
multiWrapLimit - result),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
else
ereport(WARNING,
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
@@ -1063,7 +1063,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
oldest_datoid,
multiWrapLimit - result),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
}
/* Re-acquire lock and start over */
@@ -2336,7 +2336,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
oldest_datname,
multiWrapLimit - curMulti),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
else
ereport(WARNING,
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
@@ -2345,7 +2345,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
oldest_datoid,
multiWrapLimit - curMulti),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 667da0fd64..fd93793e0b 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -129,14 +129,14 @@ GetNewTransactionId(bool isSubXact)
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
{
@@ -149,14 +149,14 @@ GetNewTransactionId(bool isSubXact)
oldest_datname,
xidWrapLimit - xid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
}
/* Re-acquire lock and start over */
@@ -464,14 +464,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
oldest_datname,
xidWrapLimit - curXid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
- "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions.")));
}
}
--
2.40.0
On Tue, 4 Apr 2023 at 17:08, Aleksander Alekseev
<aleksander@timescale.com> wrote:
Hi,
The proposed changes are in patchset v5.
Pavel, John, thanks for your feedback.
I've looked into the patches v4.
For 0001:
I think long "not accepting commands that generate" is equivalent to
more concise "can't generate".Frankly I don't think this is a good change for this particular CF
entry and it violates the consistency with multixact.c. Additionally
the new message is not accurate. The DBMS _can_ generate new XIDs,
quite a few of them actually. It merely refuses to do so.For all:
In a errhint's list what _might_ be done I think AND is a little bit
better that OR as the word _might_ means that each of the proposals in
the list is a probable, not a sure one.Well, that's debatable... IMO "or" makes a bit more sense since the
user knows better whether he/she needs to kill a long-running
transaction, or run VACUUM, or maybe do both. "And" would imply that
the user needs to do all of it, which is not necessarily true. Since
originally it was "or" I suggest we keep it as is for now.All in all I would prefer keeping the focus on the particular problem
the patch tries to address.For 0003:
I think double mentioning of Vacuum at each errhist i.e.: "Execute a
database-wide VACUUM in that database" and "...or run a manual
database-wide VACUUM." are redundant. The advice "Ensure that
autovacuum is progressing,..." is also not needed after advice to
"Execute a database-wide VACUUM in that database".
[...]Okay, great. For v4-0003:
Each hint mentions vacuum twice, because it's adding the vacuum message at the end, but not removing it from the beginning. The vacuum string should be on its own line, since we will have to modify that for back branches (skip indexes and truncation).
My bad. Fixed.
I'm also having second thoughts about "Ensure that autovacuum is progressing" in the hint. That might be better in the docs, if we decide to go ahead with adding a specific checklist there.
OK, done.
In vacuum.c:
errhint("Close open transactions soon to avoid wraparound problems.\n" - "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); + "You might also need to commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.")));This appears in vacuum_get_cutoffs(), which is called by vacuum and cluster, and the open transactions were already mentioned, so this is not the place for this change.
Fixed.
v4-0002:
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n" + errhint("VACUUM that database.\n"Further in the spirit of consistency, the mulixact path already has "Execute a database-wide VACUUM in that database.\n", and that seems like better wording.
Agree. Fixed.
Alexander,
Ok, nice! I think it could be moved to committer then.
Pavel.
On Tue, Apr 4, 2023 at 8:08 PM Aleksander Alekseev <aleksander@timescale.com>
wrote:
[v6]
0001:
Looks good to me. I've just made some small edits for v7 and wrote a commit
message to explain how we got here. This can be backpatched all the way, as
it's simply a correction. I do want to test on v11 first just for
completeness. (The reality has already been tested by others back to 9.6
but there's no substitute for trying it yourself). I hope to commit soon
after that.
0002:
I've been testing wraparound using the v3 convenience function in [1]/messages/by-id/CAD21AoBZ3t+fRtVamQTA+wBJaapFUY1dfP08-rxsQ+fouPvgKg@mail.gmail.com to
reach xidStopLimit:
-- reduce log spam
alter system set log_min_messages = error;
alter system set client_min_messages = error;
-- restart
-- no actual replication, just for testing dropping it
select pg_create_physical_replication_slot('foo', true, false);
create table t (i int);
BEGIN;
insert into t values(1);
PREPARE TRANSACTION 'trx_id_pin';
-- get to xidStopLimit
select consume_xids(1*1000*1000*1000);
insert into t values(1);
select consume_xids(1*1000*1000*1000);
insert into t values(1);
select consume_xids( 140*1000*1000);
insert into t values(1);
select consume_xids( 10*1000*1000);
SELECT datname, age(datfrozenxid) FROM pg_database;
-- works just fine
select pg_drop_replication_slot('foo');
COMMIT PREPARED 'trx_id_pin';
-- watch autovacuum take care of it automatically:
SELECT datname, age(datfrozenxid) FROM pg_database;
The consume_xids function builds easily on PG14, but before that it would
need a bit of work because data types changed. That coincidentally was the
first version to include the failsafe, which is convenient in this
scenario. I'd like to do testing on PG12/13 before commit, which would
require turning off truncation in the command (and can also be made faster
by turning off index cleanup), but I'm also okay with going ahead with just
going back to PG14 at first. That also safest.
I made some small changes and wrote a suitably comprehensive commit
message. I separated the possible uses for single-user mode into a separate
paragraph in the "Note:" , moved the justification for the 3-million xid
margin there, and restored the link to how to run it (I already mentioned
we still need this info, but didn't notice this part didn't make it back
in).
0003:
It really needs a more comprehensive change, and just making a long hint
even longer doesn't seem worth doing. I'd like to set that aside and come
back to it. I've left it out of the attached set.
[1]: /messages/by-id/CAD21AoBZ3t+fRtVamQTA+wBJaapFUY1dfP08-rxsQ+fouPvgKg@mail.gmail.com
/messages/by-id/CAD21AoBZ3t+fRtVamQTA+wBJaapFUY1dfP08-rxsQ+fouPvgKg@mail.gmail.com
--
John Naylor
EDB: http://www.enterprisedb.com
Attachments:
v7-0002-Stop-telling-users-to-run-VACUUM-in-a-single-user.patchtext/x-patch; charset=US-ASCII; name=v7-0002-Stop-telling-users-to-run-VACUUM-in-a-single-user.patchDownload
From f5fa3605effd624b263ff3e5dbb8b3e5c8992dba Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v7 2/2] Stop telling users to run VACUUM in a single-user mode
Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:
* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
mode vacuum, with or without VERBOSE
If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.
As mentioned in commit XXXXXXXXX, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.
By remaining in multi-user mode, users still have read-only access to
their database, they can use parallelism, they can use command line
utilities like vacuumdb, and they can remotely access the database
using normal clients.
The only reason to restart in single-user mode is to DROP or TRUNCATE
a table, when it is suspected that doing that would be faster than
vacuuming.
Also add an explicit note warning against using single-user mode.
Backpatch to v14, which is the first version with the VACUUM failsafe.
XXX We should consider v12-13 with "VACUUM (TRUNCATE off, INDEX_CLEANUP
off);", but it's not yet convenient to get to xidStopLimit before v14.
XXX We could consider v11 with careful instructions about redoing
VACUUMs in single-user mode when truncation is necessary.
Aleksander Alekseev, with some adjustments by me
Reported as various times by (at least) Hannu Krosing, Robert Haas,
and Andres Freund
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CA%2BTgmob1QCMJrHwRBK8HZtGsr%2B6cJANRQw2mEgJ9e%3DD%2Bz7cOsw%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 32 +++++++++++++++++++++--------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 105a9900cb..ae9a35fb73 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -669,22 +669,36 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: Execute a database-wide VACUUM in that database.
</programlisting>
In this condition any transactions already in progress can continue,
but only read-only transactions can be started. Operations that
modify database records or truncate relations will fail.
-
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ The <command>VACUUM</command> command can still be run normally to recover.
+ <!-- v12 and v13 need VACUUM (TRUNCATE off, INDEX_CLEANUP off); >
+ <!-- v11 maybe mention that it will often succed, but will need S-U if truncation happens >
</para>
+ <note>
+ <para>
+ In earlier versions it was required to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. There is no
+ need to use single-user mode anymore, and in fact it's strongly
+ discouraged: It increases downtime, makes monitoring impossible,
+ disables replication, bypasses safeguards against wraparound, etc.
+ </para>
+
+ <para>
+ The only reason to use single-user mode in an emergency is to e.g.
+ <command>TRUNCATE</command> or <command>DROP</command> unneeded tables
+ to avoid needing to <command>VACUUM</command> them.
+ The three-million-transaction safety margin exists to let the
+ administrator do this. See the <xref linkend="app-postgres"/>
+ reference page for details about using single-user mode.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 485d8ebf81..667da0fd64 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.39.2
v7-0001-Correct-outdated-docs-and-messages-regarding-XID-.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Correct-outdated-docs-and-messages-regarding-XID-.patchDownload
From fe9ebc1f89e946f2d9e47fdb4f889a3872532338 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 28 Apr 2023 16:08:33 +0700
Subject: [PATCH v7 1/2] Correct outdated docs and messages regarding XID
limits
Previously, when approaching xidStopLimit or xidWrapLimit, log messages
would warn against a "database shutdown", and when it reached those
limits claimed that it "is not accepting commands". This language
originated in commit 60b2444cc when the xidStopLimit was added in
2005. At that time, even a trivial SELECT would have failed.
Commit 295e63983d in 2007 introduced virtual transaction IDs, which
allowed actual XIDs to be allocated lazily when it is necessary
to do so, such as when modifying database records. Since then, the
behavior at these limits is merely to refuse to allocate new XIDs,
so read-only queries can continue to be initiated.
This has been wrong for a very long time, so backpatch to all
supported branches.
Aleksander Alekseev, with some editing by me
Reviewed by Pavel Borisov
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 22 ++++++++++++----------
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 12 ++++++------
3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..105a9900cb 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
@@ -664,23 +664,25 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ ignored, the system will refuse to allocate new XIDs once there are
+ fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already in progress can continue,
+ but only read-only transactions can be started. Operations that
+ modify database records or truncate relations will fail.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ff..1dd29c02cc 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.2
On Sat, Apr 29, 2023 at 1:09 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
Looks good to me.
I'm strongly in favor of this. It's most unfortunate that it took this long.
I've just made some small edits for v7 and wrote a commit message to explain how we got here. This can be backpatched all the way, as it's simply a correction.
+1 to backpatching at least back until v14. After all, it wouldn't
make any sense for us to not backpatch to 14; the whole justification
for doing this was in no way influenced by anything that happened
since the failsafe went in.
I'm also in favor of backpatching to 11, 12, and 13 -- though I
acknowledge that that's more of a judgement call. As you note in
comments in the draft patch, the story with these earlier releases
(especially 11) is a little more complicated for users.
I made some small changes and wrote a suitably comprehensive commit message. I separated the possible uses for single-user mode into a separate paragraph in the "Note:" , moved the justification for the 3-million xid margin there, and restored the link to how to run it (I already mentioned we still need this info, but didn't notice this part didn't make it back in).
I notice that you've called xidStopLimit "read-only mode" in the docs.
I think that it makes sense that you wouldn't use the term
xidStopLimit here, but I'm not sure about this terminology, either. It
seems to me that it should be something quite specific, since we're
talking about a very specific mechanism. Whatever it is, It shouldn't
contain the word "wraparound".
Separately, there is a need to update a couple of other places to use
this new terminology. The documentation for vacuum_sailsafe_age and
vacuum_multixact_failsafe_age refer to "system-wide transaction ID
wraparound failure", which seems less than ideal, even without your
patch.
Do we need two new names? One for xidStopLimit, another for
multiStopLimit? The latter really can't be called read-only mode.
0003:
It really needs a more comprehensive change, and just making a long hint even longer doesn't seem worth doing. I'd like to set that aside and come back to it. I've left it out of the attached set.
Yeah, 0003 can be treated as independent work IMV.
--
Peter Geoghegan
On Sun, Apr 30, 2023 at 4:15 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Apr 29, 2023 at 1:09 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
I made some small changes and wrote a suitably comprehensive commit
message. I separated the possible uses for single-user mode into a separate
paragraph in the "Note:" , moved the justification for the 3-million xid
margin there, and restored the link to how to run it (I already mentioned
we still need this info, but didn't notice this part didn't make it back
in).
I notice that you've called xidStopLimit "read-only mode" in the docs.
I think that it makes sense that you wouldn't use the term
xidStopLimit here, but I'm not sure about this terminology, either. It
seems to me that it should be something quite specific, since we're
talking about a very specific mechanism. Whatever it is, It shouldn't
contain the word "wraparound".
How about
-HINT: To avoid a database shutdown, [...]
+HINT: To prevent XID exhaustion, [...]
...and "MXID", respectively? We could explain in the docs that vacuum and
read-only queries still work "when XIDs have been exhausted", etc.
(I should probably also add in the commit message that the "shutdown" in
the message was carried over to MXIDs when they arrived also in 2005).
Separately, there is a need to update a couple of other places to use
this new terminology. The documentation for vacuum_sailsafe_age and
vacuum_multixact_failsafe_age refer to "system-wide transaction ID
wraparound failure", which seems less than ideal, even without your
patch.
Right, I'll have a look.
Do we need two new names? One for xidStopLimit, another for
multiStopLimit? The latter really can't be called read-only mode.
Thanks for that correction.
Somewhat related to the now-postponed 0003: I think the docs would do well
to have ordered steps for recovering from both XID and MXID exhaustion. The
previous practice of shutting down had the side-effect of e.g. rolling back
all in-progress transactions whose XID appeared in a MXID but if you remain
in normal mode there is a bit more to check. Manual VACUUM will warn about
"cutoff for removing and freezing tuples is far in the past", but the docs
should be clear on this.
--
John Naylor
EDB: http://www.enterprisedb.com
On Sat, Apr 29, 2023 at 7:30 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
How about
-HINT: To avoid a database shutdown, [...] +HINT: To prevent XID exhaustion, [...]...and "MXID", respectively? We could explain in the docs that vacuum and read-only queries still work "when XIDs have been exhausted", etc.
I think that that particular wording works in this example -- we *are*
avoiding XID exhaustion. But it still doesn't really address my
concern -- at least not on its own. I think that we need a term for
xidStopLimit mode (and perhaps multiStopLimit) itself. This is a
discrete state/mode that is associated with a specific mechanism. I'd
like to emphasize the purpose of xidStopLimit (over when xidStopLimit
happens) in choosing this user-facing name.
As you know, the point of xidStopLimit mode is to give autovacuum the
chance to catch up with managing the XID space through freezing: the
available supply of XIDs doesn't meet present demand, and hasn't for
some time, so it finally came to this. Even if we had true 64-bit XIDs
we'd probably still need something similar -- there would still have
to be *some* point that allowing the "freezing deficit" to continue to
grow just wasn't tenable. If a person consistently spends more than
they take in, their "initial bankroll" isn't necessarily relevant. If
our ~2.1 billion XID "bankroll" wasn't enough to avoid xidStopLimit,
why would we expect 8 billion or 20 billion XIDs to have been enough?
I'm thinking of a user-facing name for xidStopLimit along the lines of
"emergency XID allocation restoration mode" (admittedly that's quite a
mouthful). Something that carries the implication of "imbalance". The
system was configured in a way that turned out to be unsustainable.
The system was therefore forced to "restore sustainability" using the
only tool that remained. This is closely related to the failsafe.
As bad as xidStopLimit is, it won't always be the end of the world --
much depends on individual application requirements.
(I should probably also add in the commit message that the "shutdown" in the message was carried over to MXIDs when they arrived also in 2005).
Separately, there is a need to update a couple of other places to use
this new terminology. The documentation for vacuum_sailsafe_age and
vacuum_multixact_failsafe_age refer to "system-wide transaction ID
wraparound failure", which seems less than ideal, even without your
patch.Right, I'll have a look.
As you know, there is a more general problem with the use of the term
"wraparound" in the docs, and in the system itself (in places like
pg_stat_activity). Even the very basic terminology in this area is
needlessly scary. Terms like "VACUUM (to prevent wraparound)" are
uncomfortably close to "a race against time to avoid data corruption".
The system isn't ever supposed to corrupt data, even if misconfigured
(unless the misconfiguration is very low-level, such as "fsync=off").
Users should be able to take that much for granted.
I don't expect either of us to address that problem in the short term
-- the term "wraparound" is too baked-in for it to be okay to just
remove it overnight. But, it could still make sense for your patch (or
my own) to fully own the fact that "wraparound" is actually a
misnomer. At least when used in contexts like "to prevent wraparound"
(xidStopLimit actually "prevents wraparound", though we shouldn't say
anything about it in a place of prominence). Let's reassure users that
they should continue to take "we won't corrupt your data for no good
reason" for granted.
I think the docs would do well to have ordered steps for recovering from both XID and MXID exhaustion.
I had planned to address this with my ongoing work on the "Routine
Vacuuming" docs, but I think that you're right about the necessity of
addressing it as part of this patch.
These extra steps will be required whenever the problem is a leaked
prepared transaction, or something along those lines. That is
increasingly likely to turn out to be the underlying cause of entering
xidStopLimit, given the work we've done on VACUUM over the years. I
still think that "imbalance" is the right way to frame discussion of
xidStopLimit. After all, autovacuum/VACUUM will still spin its wheels
in a futile effort to "restore balance". So it's kinda still about
restoring imbalance IMV.
--
Peter Geoghegan
On Mon, May 1, 2023 at 2:30 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Apr 29, 2023 at 7:30 PM John Naylor
<john.naylor@enterprisedb.com> wrote:How about
-HINT: To avoid a database shutdown, [...] +HINT: To prevent XID exhaustion, [...]...and "MXID", respectively? We could explain in the docs that vacuum
and read-only queries still work "when XIDs have been exhausted", etc.
I think that that particular wording works in this example -- we *are*
avoiding XID exhaustion. But it still doesn't really address my
concern -- at least not on its own. I think that we need a term for
xidStopLimit mode (and perhaps multiStopLimit) itself. This is a
discrete state/mode that is associated with a specific mechanism.
Well, since you have a placeholder "xidStopLimit mode" in your other patch,
I'll confine my response to there. Inventing "modes" seems like an awkward
thing to backpatch, not to mention moving the goalposts. My modest goal
here is quite limited: to stop lying to our users about "not accepting
commands", and change tragically awful advice into sensible advice.
Here's my new idea:
-HINT: To avoid a database shutdown, [...]
+HINT: To prevent XID generation failure, [...]
Actually, I like "allocation" better, but the v8 patch now has "generation"
simply because one MXID message already has "generate" and I did it that
way before thinking too hard. I'd be okay with either one as long as it's
consistent.
(I should probably also add in the commit message that the "shutdown"
in the message was carried over to MXIDs when they arrived also in 2005).
Done
Separately, there is a need to update a couple of other places to use
this new terminology. The documentation for vacuum_sailsafe_age and
vacuum_multixact_failsafe_age refer to "system-wide transaction ID
wraparound failure", which seems less than ideal, even without your
patch.Right, I'll have a look.
Looking now, I'm even less inclined to invent new terminology in back
branches.
As you know, there is a more general problem with the use of the term
"wraparound" in the docs, and in the system itself (in places like
pg_stat_activity). Even the very basic terminology in this area is
needlessly scary. Terms like "VACUUM (to prevent wraparound)" are
uncomfortably close to "a race against time to avoid data corruption".
The system isn't ever supposed to corrupt data, even if misconfigured
(unless the misconfiguration is very low-level, such as "fsync=off").
Users should be able to take that much for granted.
Granted. Whatever form your rewrite ends up in, it could make a lot of
sense to then backpatch a few localized corrections. I wouldn't even object
to including a few substitutions of s/wraparound failure/allocation
failure/ where appropriate. Let's see how that shakes out first.
I think the docs would do well to have ordered steps for recovering
from both XID and MXID exhaustion.
I had planned to address this with my ongoing work on the "Routine
Vacuuming" docs, but I think that you're right about the necessity of
addressing it as part of this patch.
0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID
part is mostly copy-pasted from the XID part, just to get something
together. I'd like to abbreviate that somehow.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachments:
v8-0002-Stop-telling-users-to-run-VACUUM-in-a-single-user.patchtext/x-patch; charset=US-ASCII; name=v8-0002-Stop-telling-users-to-run-VACUUM-in-a-single-user.patchDownload
From 469d0e7123a16386e300a85a6bb08109e283b65c Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v8 2/3] Stop telling users to run VACUUM in a single-user mode
Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:
* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
mode vacuum, with or without VERBOSE
If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.
As mentioned in commit XXXXXXXXX, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.
By remaining in multi-user mode, users still have read-only access to
their database, they can use parallelism, they can use command line
utilities like vacuumdb, and they can remotely access the database
using normal clients.
The only reason to restart in single-user mode is to DROP or TRUNCATE
a table, when it is suspected that doing that would be faster than
vacuuming.
Also add an explicit note warning against using single-user mode.
Backpatch to v14, which is the first version with the VACUUM failsafe.
XXX We should consider v12-13 with "VACUUM (TRUNCATE off, INDEX_CLEANUP
off);", but it's not yet convenient to get to xidStopLimit before v14.
XXX We could consider v11 with careful instructions about redoing
VACUUMs in single-user mode when truncation is necessary.
Aleksander Alekseev, with some adjustments by me
Reviewed by Peter Geoghegan
Reported at various times by (at least) Hannu Krosing, Robert Haas,
and Andres Freund
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BTgmob1QCMJrHwRBK8HZtGsr%2B6cJANRQw2mEgJ9e%3DD%2Bz7cOsw%40mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 32 +++++++++++++++++++++--------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 48d43cb339..116d6187cd 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -669,22 +669,36 @@ HINT: To prevent XID generation failure, execute a database-wide VACUUM in that
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: Execute a database-wide VACUUM in that database.
</programlisting>
In this condition any transactions already in progress can continue,
but only read-only transactions can be started. Operations that
modify database records or truncate relations will fail.
-
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ The <command>VACUUM</command> command can still be run normally to recover.
+ <!-- v12 and v13 need VACUUM (TRUNCATE off, INDEX_CLEANUP off); -->
+ <!-- v11 maybe mention that it will often succed, but will need S-U if truncation happens -->
</para>
+ <note>
+ <para>
+ In earlier versions it was required to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. There is no
+ need to use single-user mode anymore, and in fact it's strongly
+ discouraged: It increases downtime, makes monitoring impossible,
+ disables replication, bypasses safeguards against wraparound, etc.
+ </para>
+
+ <para>
+ The only reason to use single-user mode in an emergency is to e.g.
+ <command>TRUNCATE</command> or <command>DROP</command> unneeded tables
+ to avoid needing to <command>VACUUM</command> them.
+ The three-million-transaction safety margin exists to let the
+ administrator do this. See the <xref linkend="app-postgres"/>
+ reference page for details about using single-user mode.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 672e6f0196..0d6c4925d5 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.39.2
v8-0001-Correct-outdated-docs-and-messages-regarding-XID-.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Correct-outdated-docs-and-messages-regarding-XID-.patchDownload
From 313019305d78a82b816fcfcbd79dc5c570dd31af Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 28 Apr 2023 16:08:33 +0700
Subject: [PATCH v8 1/3] Correct outdated docs and messages regarding XID
limits
Previously, when approaching xidStopLimit or xidWrapLimit, log messages
would warn against a "database shutdown", and when it reached those
limits claimed that it "is not accepting commands". This language
originated in commit 60b2444cc when the xidStopLimit was added in
2005. At that time, even a trivial SELECT would have failed.
Commit 295e63983d in 2007 introduced virtual transaction IDs, which
allowed actual XIDs to be allocated lazily when it is necessary
to do so, such as when modifying database records. Since then, the
behavior at these limits is merely to refuse to allocate new XIDs,
so read-only queries can continue to be initiated.
The "database shutdown" message was also copied to the message
warning for multiWarnLimit when it was added.
This has been wrong for a very long time, so backpatch to all
supported branches.
Aleksander Alekseev, with some editing by me
Reviewed by Pavel Borisov and Peter Geoghegan
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 22 ++++++++++++----------
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 12 ++++++------
3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..48d43cb339 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent XID generation failure, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
@@ -664,23 +664,25 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ ignored, the system will refuse to generate new XIDs once there are
+ fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already in progress can continue,
+ but only read-only transactions can be started. Operations that
+ modify database records or truncate relations will fail.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ff..ca0e038c36 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..672e6f0196 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent XID generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent XID generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.2
v8-0003-Rough-draft-of-complete-steps-to-recover-from-M-X.patchtext/x-patch; charset=US-ASCII; name=v8-0003-Rough-draft-of-complete-steps-to-recover-from-M-X.patchDownload
From ef5448479cea756c349ab2b46138477250cbac8e Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Mon, 1 May 2023 19:07:37 +0700
Subject: [PATCH v8 3/3] Rough draft of complete steps to recover from (M)XID
generation failure
---
doc/src/sgml/maintenance.sgml | 53 +++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 116d6187cd..8b9f34074b 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -678,6 +678,20 @@ HINT: Execute a database-wide VACUUM in that database.
The <command>VACUUM</command> command can still be run normally to recover.
<!-- v12 and v13 need VACUUM (TRUNCATE off, INDEX_CLEANUP off); -->
<!-- v11 maybe mention that it will often succed, but will need S-U if truncation happens -->
+
+ However, it is first necessary to remove obstacles to advancing the XID horizon:
+
+ <orderedlist>
+ <listitem>
+ <simpara>Commit or rollback any prepared transactions</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Terminate any sessions that might have open transactions</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Drop any old replication slots</simpara>
+ </listitem>
+ </orderedlist>
</para>
<note>
@@ -763,6 +777,45 @@ HINT: Execute a database-wide VACUUM in that database.
have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled.
</para>
+
+<!-- WIP: Lot's of stuff identical to the XID case, can we abbreviate? -->
+ <para>
+ If for some reason autovacuum fails to clear old MXIDs from a table, the
+ system will begin to emit warning messages like this when the database's
+ oldest XIDs reach forty million transactions from the wraparound point:
+
+<programlisting>
+WARNING: database "mydb" must be vacuumed within 39985967 transactions
+HINT: To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.
+</programlisting>
+
+<!-- WIP: not sure about system catalogs -->
+ (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+ hint; but note that the <command>VACUUM</command> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are
+ ignored, the system will refuse to generate new MXIDs once there are
+ fewer than three million left until wraparound:
+
+<programlisting>
+ERROR: database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database "mydb"
+HINT: Execute a database-wide VACUUM in that database.
+</programlisting>
+ </para>
+
+ <para>
+ To restore normal operation, it is first necessary to remove obstacles to advancing the MXID horizon:
+ <orderedlist>
+ <listitem>
+ <simpara>Commit or rollback each prepared transaction that might appear in a multixact</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Resolve each transaction that might appear in a multixact</simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
</sect3>
</sect2>
--
2.39.2
On Mon, May 1, 2023 at 5:34 AM John Naylor <john.naylor@enterprisedb.com> wrote:
Well, since you have a placeholder "xidStopLimit mode" in your other patch, I'll confine my response to there. Inventing "modes" seems like an awkward thing to backpatch, not to mention moving the goalposts. My modest goal here is quite limited: to stop lying to our users about "not accepting commands", and change tragically awful advice into sensible advice.
I can't argue with that.
Here's my new idea:
-HINT: To avoid a database shutdown, [...] +HINT: To prevent XID generation failure, [...]Actually, I like "allocation" better, but the v8 patch now has "generation" simply because one MXID message already has "generate" and I did it that way before thinking too hard. I'd be okay with either one as long as it's consistent.
WFM.
Granted. Whatever form your rewrite ends up in, it could make a lot of sense to then backpatch a few localized corrections. I wouldn't even object to including a few substitutions of s/wraparound failure/allocation failure/ where appropriate. Let's see how that shakes out first.
Makes sense.
I think the docs would do well to have ordered steps for recovering from both XID and MXID exhaustion.
I had planned to address this with my ongoing work on the "Routine
Vacuuming" docs, but I think that you're right about the necessity of
addressing it as part of this patch.0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID part is mostly copy-pasted from the XID part, just to get something together. I'd like to abbreviate that somehow.
Yeah, the need to abbreviate statements about MultiXact IDs by saying
that they work analogously to XIDs in some particular respect
is...another thing that makes this tricky.
I don't think that Multis are fundamentally different to XIDs. I
believe that the process through which VACUUM establishes its
OldestMXact cutoff can be pessimistic compared to OldestXmin, but I
don't think that it changes the guidance you'll need to give here.
VACUUM should always be able to advance relminmxid right up until
OldestMXact, if that's what the user insists on. For example, VACUUM
FREEZE sometimes allocates new Multis, just to be able to do that.
Obviously there are certain things that can hold back OldestMXact by a
wildly excessive amount. But I don't think that there is anything that
can hold back OldestMXact by a wildly excessive amount that won't more
or less do the same thing to OldestXmin.
--
Peter Geoghegan
On Mon, May 1, 2023 at 7:55 PM Peter Geoghegan <pg@bowt.ie> wrote:
Obviously there are certain things that can hold back OldestMXact by a
wildly excessive amount. But I don't think that there is anything that
can hold back OldestMXact by a wildly excessive amount that won't more
or less do the same thing to OldestXmin.
Actually, it's probably possible for a transaction that only has a
virtual transaction ID to call MultiXactIdSetOldestVisible(), which
will then have the effect of holding back OldestMXact without also
holding back OldestXmin (in READ COMMITTED mode).
Will have to check to make sure, but that won't happen today.
--
Peter Geoghegan
On Tue, May 2, 2023 at 9:55 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 5:34 AM John Naylor <john.naylor@enterprisedb.com>
wrote:
0003 is now a quick-and-dirty attempt at that, only in the docs. The
MXID part is mostly copy-pasted from the XID part, just to get something
together. I'd like to abbreviate that somehow.
Yeah, the need to abbreviate statements about MultiXact IDs by saying
that they work analogously to XIDs in some particular respect
is...another thing that makes this tricky.
Then it sounds like they should stay separate. A direct copy-paste is not
good for style, so I will add things like:
- If for some reason autovacuum fails to clear old MXIDs from a table, the
+ As in the case with XIDs, it is possible for autovacuum to fail to [...]
It might least be good for readability to gloss over the warning and only
quote the MXID limit error message, but we'll have to see how it looks.
--
John Naylor
EDB: http://www.enterprisedb.com
On Tue, May 2, 2023 at 6:46 PM John Naylor <john.naylor@enterprisedb.com> wrote:
It might least be good for readability to gloss over the warning and only quote the MXID limit error message, but we'll have to see how it looks.
Apparently you expect me to join you in pretending that you didn't
lambast my review on this thread less than 24 hours ago [1]/messages/by-id/CAFBsxsGJMp43QO2cLAh0==ueYVL35pbbEHeXZ0cnZkU=q8sFkg@mail.gmail.com -- Peter Geoghegan. I happen
to believe that this particular patch is of great strategic
importance, so I'll admit that I thought about it for a second. But
just a second -- I have more self-respect than that.
That's not the only reason, though. I also genuinely don't have the
foggiest notion what was behind what you said. In particular, this
part still makes zero sense to me:
"Claim that others are holding you back, and then try to move the
goalposts in their work"
Let me get this straight: "Moving the goalposts of their work" refers
to something *I* did to *you*, on *this* thread...right?
If anything, I'm biased in *favor* of this patch. The fact that you
seem to think that I was being obstructionist just doesn't make any
sense to me, at all. I really don't know where to go from there. I'm
not so much upset as baffled.
[1]: /messages/by-id/CAFBsxsGJMp43QO2cLAh0==ueYVL35pbbEHeXZ0cnZkU=q8sFkg@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
On Wed, May 3, 2023 at 10:04 AM Peter Geoghegan <pg@bowt.ie> wrote:
That's not the only reason, though. I also genuinely don't have the
foggiest notion what was behind what you said. In particular, this
part still makes zero sense to me:"Claim that others are holding you back, and then try to move the
goalposts in their work"
I went to go find the phrase that I thought I was reacted to, and ...
nothing. I am also baffled. My comment was inexcusable.
--
John Naylor
EDB: http://www.enterprisedb.com
On Wed, May 3, 2023 at 12:30 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
I went to go find the phrase that I thought I was reacted to, and ... nothing. I am also baffled. My comment was inexcusable.
I'd quite like to drop this topic, and get on with the work at hand.
But before I do that, I ask you to consider one thing: if you were
mistaken about my words (or their intent) on this occasion, isn't it
also possible that it wasn't the first time?
I never had the opportunity to sit down to talk with you face to face
before now. If things had been different (if we managed to talk at one
of the PGCons before COVID, say), then maybe this incident would have
happened in just the same way. I can't help but think that some face
time would have prevented the whole episode, though.
You have every right to dislike me on a personal level, of course, but
if you do then I'd very much prefer that it be due to one of my actual
flaws. I'm not a petty man -- I don't resent the success of others.
I've always thought that you do rather good work. Plus I'm just not in
the habit of obstructing things that I directly benefit from.
--
Peter Geoghegan
On Thu, May 4, 2023 at 12:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
I'd quite like to drop this topic, and get on with the work at hand.
I'd be grateful, and the other points you made are, of course, valid.
--
John Naylor
EDB: http://www.enterprisedb.com
Attached is v9, which is mostly editing the steps for restoring normal
operation, which are in 0003 now but will be squashed into 0002. Comments
to polish the wording welcome.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachments:
v9-0003-Rough-draft-of-complete-steps-to-recover-from-M-X.patchtext/x-patch; charset=US-ASCII; name=v9-0003-Rough-draft-of-complete-steps-to-recover-from-M-X.patchDownload
From 0e9d6ea72216b196d37de4629736c0cf34e7cd5c Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Sat, 13 May 2023 11:03:40 +0700
Subject: [PATCH v9 3/3] Rough draft of complete steps to recover from (M)XID
generation failure
TODO: squash with previous
---
doc/src/sgml/maintenance.sgml | 61 ++++++++++++++++++++++++++++++++++-
1 file changed, 60 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 45d6cd1815..fee88cb647 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -675,7 +675,25 @@ HINT: Execute a database-wide VACUUM in that database.
In this condition any transactions already in progress can continue,
but only read-only transactions can be started. Operations that
modify database records or truncate relations will fail.
- The <command>VACUUM</command> command can still be run normally to recover.
+ The <command>VACUUM</command> command can still be run normally.
+ To restore normal operation:
+
+ <orderedlist>
+ <listitem>
+ <simpara>Commit or rollback any prepared transactions.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Terminate any sessions that might have open transactions.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Drop any old replication slots.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Ensure autovacuum is running, and execute a database-wide <command>VACUUM</command>.
+ To reduce the time required, it as also possible to issue manual <command>VACUUM</command>
+ commands on the tables where <structfield>relminxid</structfield> is oldest.</simpara>
+ </listitem>
+ </orderedlist>
</para>
<note>
@@ -761,6 +779,47 @@ HINT: Execute a database-wide VACUUM in that database.
have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled.
</para>
+
+ <para>
+ Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
+ system will begin to emit warning messages like this when the database's
+ oldest MXIDs reach forty million transactions from the wraparound point:
+
+<programlisting>
+WARNING: database "mydb" must be vacuumed within 39985967 transactions
+HINT: To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.
+</programlisting>
+
+ (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+ hint; but note that the <command>VACUUM</command> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are ignored, the system will refuse to generate new
+ MXIDs once there are fewer than three million left until wraparound:
+
+<programlisting>
+ERROR: database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database "mydb"
+HINT: Execute a database-wide VACUUM in that database.
+</programlisting>
+ </para>
+
+ <para>
+ To restore normal operation:
+ <orderedlist>
+ <listitem>
+ <simpara>Commit or rollback each prepared transaction that might appear in a multixact.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Resolve each transaction that might appear in a multixact.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Ensure autovacuum is running, and execute a database-wide <command>VACUUM</command>.
+ To reduce the time required, it as also possible to issue manual <command>VACUUM</command>
+ commands on the tables where <structfield>relminmxid</structfield> is oldest.</simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
</sect3>
</sect2>
--
2.39.2
v9-0001-Correct-outdated-docs-and-messages-regarding-XID-.patchtext/x-patch; charset=US-ASCII; name=v9-0001-Correct-outdated-docs-and-messages-regarding-XID-.patchDownload
From 267609882a0be2764bc33fc289c0a962d47643c4 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 28 Apr 2023 16:08:33 +0700
Subject: [PATCH v9 1/3] Correct outdated docs and messages regarding XID/MXID
limits
Previously, when approaching xidStopLimit or xidWrapLimit, log messages
would warn against a "database shutdown", and when it reached those
limits claimed that it "is not accepting commands". This language
originated in commit 60b2444cc when the xidStopLimit was added in
2005. At that time, even a trivial SELECT would have failed.
Commit 295e63983d in 2007 introduced virtual transaction IDs, which
allowed actual XIDs to be allocated lazily when it is necessary
to do so, such as when modifying database records. Since then, the
behavior at these limits is merely to refuse to allocate new XIDs,
so read-only queries can continue to be initiated.
The "database shutdown" message was also copied to the message
warning for multiWarnLimit when it was added.
This has been wrong for a very long time, so backpatch to all
supported branches.
Aleksander Alekseev, with some editing by me
Reviewed by Pavel Borisov and Peter Geoghegan
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 22 ++++++++++++----------
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 12 ++++++------
3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..48d43cb339 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To prevent XID generation failure, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
@@ -664,23 +664,25 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's <structfield>datfrozenxid</structfield>.)
If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ ignored, the system will refuse to generate new XIDs once there are
+ fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
+ In this condition any transactions already in progress can continue,
+ but only read-only transactions can be started. Operations that
+ modify database records or truncate relations will fail.
+
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
+ required <command>VACUUM</command> commands. However
the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ mode to execute <command>VACUUM</command>. See the
+ <xref linkend="app-postgres"/> reference page for details about using
+ single-user mode.
</para>
<sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ff..ca0e038c36 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..672e6f0196 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent XID generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent XID generation failure, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.39.2
v9-0002-Stop-telling-users-to-run-VACUUM-in-a-single-user.patchtext/x-patch; charset=US-ASCII; name=v9-0002-Stop-telling-users-to-run-VACUUM-in-a-single-user.patchDownload
From aecffc481503f471d8bc1ab0025bc243d126cb47 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v9 2/3] Stop telling users to run VACUUM in a single-user mode
Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:
* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
mode vacuum, with or without VERBOSE
If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.
As mentioned in commit XXXXXXXXX, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.
By remaining in multi-user mode, users still have read-only access to
their database, they can use parallelism, they can use command line
utilities like vacuumdb, and they can remotely access the database
using normal clients.
The only reason to restart in single-user mode is to DROP or TRUNCATE
a table, when it is suspected that doing that would be faster than
vacuuming.
Also add an explicit note warning against using single-user mode.
Backpatch to v14, which is the first version with the VACUUM failsafe.
XXX We should consider v12-13 with "VACUUM (TRUNCATE off, INDEX_CLEANUP
off);", but it's not yet convenient to get to xidStopLimit before v14.
XXX We could consider v11 with careful instructions about redoing
VACUUMs in single-user mode when truncation is necessary.
Aleksander Alekseev, with some adjustments by me
Reviewed by Peter Geoghegan
Reported at various times by (at least) Hannu Krosing, Robert Haas,
and Andres Freund
Discussion: https://postgr.es/m/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BTgmob1QCMJrHwRBK8HZtGsr%2B6cJANRQw2mEgJ9e%3DD%2Bz7cOsw%40mail.gmail.com
Discussion: https://postgr.es/m/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com
---
doc/src/sgml/maintenance.sgml | 30 ++++++++++++++++++++---------
src/backend/access/transam/varsup.c | 4 ++--
2 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 48d43cb339..45d6cd1815 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -669,22 +669,34 @@ HINT: To prevent XID generation failure, execute a database-wide VACUUM in that
<programlisting>
ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: Execute a database-wide VACUUM in that database.
</programlisting>
In this condition any transactions already in progress can continue,
but only read-only transactions can be started. Operations that
modify database records or truncate relations will fail.
-
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. See the
- <xref linkend="app-postgres"/> reference page for details about using
- single-user mode.
+ The <command>VACUUM</command> command can still be run normally to recover.
</para>
+ <note>
+ <para>
+ In earlier versions it was required to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. There is no
+ need to use single-user mode anymore, and in fact it's strongly
+ discouraged: It increases downtime, makes monitoring impossible,
+ disables replication, bypasses safeguards against wraparound, etc.
+ </para>
+
+ <para>
+ The only reason to use single-user mode in an emergency is to e.g.
+ <command>TRUNCATE</command> or <command>DROP</command> unneeded tables
+ to avoid needing to <command>VACUUM</command> them.
+ The three-million-transaction safety margin exists to let the
+ administrator do this. See the <xref linkend="app-postgres"/>
+ reference page for details about using single-user mode.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 672e6f0196..0d6c4925d5 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
--
2.39.2
On Fri, May 12, 2023 at 9:14 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
Attached is v9, which is mostly editing the steps for restoring normal operation, which are in 0003 now but will be squashed into 0002. Comments to polish the wording welcome.
I'll try to get you more feedback on this soon.
BTW, Google cloud already just instruct their users to ignore the
xidStopLimit HINT about single user mode:
https://cloud.google.com/sql/docs/postgres/txid-wraparound
I checked with archive.org. This directive to just ignore the HINT
appeared for the first time no later than December 2021. Fixing this
in Postgres is long overdue.
--
Peter Geoghegan
On Sun, May 14, 2023 at 6:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
BTW, Google cloud already just instruct their users to ignore the
xidStopLimit HINT about single user mode:
I read this just today, and was reminded of this thread:
It reads:
"1. Transaction ID wraparound: PostgreSQL transaction IDs or XIDs are
32-bit unsigned integers that are assigned to each transaction and
also get incremented. When they reach their maximum value, it would
wrap around to zero (similar to a ring buffer) and can lead to data
corruption."
--
Peter Geoghegan
On 20.09.23 05:41, Peter Geoghegan wrote:
On Sun, May 14, 2023 at 6:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
BTW, Google cloud already just instruct their users to ignore the
xidStopLimit HINT about single user mode:I read this just today, and was reminded of this thread:
It reads:
"1. Transaction ID wraparound: PostgreSQL transaction IDs or XIDs are
32-bit unsigned integers that are assigned to each transaction and
also get incremented. When they reach their maximum value, it would
wrap around to zero (similar to a ring buffer) and can lead to data
corruption."
What is the status of this patch discussion now? It had been set as
Ready for Committer for some months. Do these recent discussions
invalidate that? Does it need more discussion?
On Sun, Oct 1, 2023 at 11:46 AM Peter Eisentraut <peter@eisentraut.org> wrote:
What is the status of this patch discussion now? It had been set as
Ready for Committer for some months. Do these recent discussions
invalidate that? Does it need more discussion?
I don't think that recent discussion invalidated anything. I meant to
follow-up on investigating the extent to which anything could hold up
OldestMXact without also holding up OldestXmin/removable cutoff, but
that doesn't seem essential.
This patch does indeed seem "ready for committer". John?
--
Peter Geoghegan
Hi!
On Mon, 2 Oct 2023 at 03:34, Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, Oct 1, 2023 at 11:46 AM Peter Eisentraut <peter@eisentraut.org> wrote:
What is the status of this patch discussion now? It had been set as
Ready for Committer for some months. Do these recent discussions
invalidate that? Does it need more discussion?I don't think that recent discussion invalidated anything. I meant to
follow-up on investigating the extent to which anything could hold up
OldestMXact without also holding up OldestXmin/removable cutoff, but
that doesn't seem essential.This patch does indeed seem "ready for committer". John?
--
Peter Geoghegan
FWIW I think the patch is still in good shape and worth to be committed.
Regards,
Pavel Borisov
Supabase
On Mon, Oct 2, 2023 at 11:52 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
FWIW I think the patch is still in good shape and worth to be committed.
I'm also pretty happy with these patches and would like to see at
least 0001 and 0002 committed, and probably 0003 as well. I am,
however, -1 on back-patching. Perhaps that is overly cautious, but I
don't like changing existing messages in back-branches. It will break
translations, and potentially monitoring scripts, etc.
If John's not available to take this forward, I can volunteer as
substitute committer, unless Peter or Peter would like to handle it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Oct 2, 2023 at 1:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
I'm also pretty happy with these patches and would like to see at
least 0001 and 0002 committed, and probably 0003 as well. I am,
however, -1 on back-patching. Perhaps that is overly cautious, but I
don't like changing existing messages in back-branches. It will break
translations, and potentially monitoring scripts, etc.If John's not available to take this forward, I can volunteer as
substitute committer, unless Peter or Peter would like to handle it.
If you're willing to take over as committer here, I'll let the issue
of backpatching go.
I only ask that you note why you've not backpatched in the commit message.
--
Peter Geoghegan
On Wed, Oct 4, 2023 at 8:07 AM Peter Geoghegan <pg@bowt.ie> wrote:
If you're willing to take over as committer here, I'll let the issue
of backpatching go.I only ask that you note why you've not backpatched in the commit message.
Will do, but see also the last point below.
I have looked over these patches in some detail and here are my thoughts:
- I find the use of the word "generate" in error messages slightly
odd. I think it's reasonable given the existing precedent, but the
word I would have picked is "assign", which I see is what Aleksander
actually had in v1. How would people feel about changing the two
existing messages that say "database is not accepting commands that
generate new MultiXactIds to avoid wraparound data loss ..." to use
"assign" instead, and then make the new messages match that?
- I think that 0002 needs a bit of wordsmithing. I will work on that.
In particular, I don't like this sentence: "It increases downtime,
makes monitoring impossible, disables replication, bypasses safeguards
against wraparound, etc." While there's nothing untrue there, it feels
more like a sentence from a pgsql-hackers email where most people
participating in the discussion understand the general contours of the
problem already than like polished documentation that really lays
things out methodically.
- I'm somewhat inclined to have a go at restructuring these patches a
bit so that some of the documentation changes can potentially be
back-patched without back-patching the message changes. Even if we
eventually decide to back-patch everything or nothing, there are
wording adjustments spread across all 3 patches that seem somewhat
independent of the changes to the server messages. I think it would be
clearer to have one patch that is mostly about documentation wording
changes, and a second one that is about changing the server messages
and then making documentation changes that are directly dependent on
those message changes. And I might also be inclined to back-patch the
former patch as far as it makes sense to do so, while leaving the
latter one master-only.
Comments?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Oct 12, 2023 at 8:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
- I find the use of the word "generate" in error messages slightly
odd. I think it's reasonable given the existing precedent, but the
word I would have picked is "assign", which I see is what Aleksander
actually had in v1. How would people feel about changing the two
existing messages that say "database is not accepting commands that
generate new MultiXactIds to avoid wraparound data loss ..." to use
"assign" instead, and then make the new messages match that?
WFM.
- I think that 0002 needs a bit of wordsmithing. I will work on that.
In particular, I don't like this sentence: "It increases downtime,
makes monitoring impossible, disables replication, bypasses safeguards
against wraparound, etc." While there's nothing untrue there, it feels
more like a sentence from a pgsql-hackers email where most people
participating in the discussion understand the general contours of the
problem already than like polished documentation that really lays
things out methodically.
I agree.
- I'm somewhat inclined to have a go at restructuring these patches a
bit so that some of the documentation changes can potentially be
back-patched without back-patching the message changes. Even if we
eventually decide to back-patch everything or nothing, there are
wording adjustments spread across all 3 patches that seem somewhat
independent of the changes to the server messages. I think it would be
clearer to have one patch that is mostly about documentation wording
changes, and a second one that is about changing the server messages
and then making documentation changes that are directly dependent on
those message changes. And I might also be inclined to back-patch the
former patch as far as it makes sense to do so, while leaving the
latter one master-only.
No objections from me.
--
Peter Geoghegan
On Thu, Oct 12, 2023 at 12:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
No objections from me.
Here is a doc-only patch that I think could be back-patched as far as
emergency mode exists. It combines all of the wording changes to the
documentation from v1-v3 of the previous version, but without changing
the message text that is quoted in the documentation, and without
adding more instances of similar message texts to the documentation,
and with a bunch of additional hacking by me. Some things I changed:
- I made it so that the MXID section refers back to the XID section
instead of duplicating it, but with a short list of differences.
- I weakened the existing claim that says you must be a superuser or
VACUUM definitely won't fix it to say instead that you SHOULD run
VACUUM as the superuser, because the former is false and the latter is
true.
- I made the list of steps for recovering more explicit.
- I split out the bit about running autovacuum in the affected
database into a separate step to be performed after VACUUM for
continued good operation, rather than a necessary ingredient in
recovery, because it isn't.
- A bit of other minor rejiggering.
I'm not forgetting about the rest of the proposed patch set, or the
change I proposed earlier. I'm just posting this much now because this
is how far I got today, and it would be useful to get comments before
I go further. I think the residual portion of the patch set not
included in this documentation patch will be quite small, and I think
that's a good thing, but again, I don't intend to blow that off.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v10-0001-Update-the-documentation-on-recovering-from-M-XI.patchapplication/octet-stream; name=v10-0001-Update-the-documentation-on-recovering-from-M-XI.patchDownload
From e37ba699adc42b7cdd568a575558b68d6288bbad Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 12 Oct 2023 16:01:57 -0400
Subject: [PATCH v10] Update the documentation on recovering from (M)XID
exhaustion.
The old documentation encourages entering single-user mode for no
reason, which is a bad plan in most cases. Instead, discourage users
from doing that, and explain the limited cases in which it may be
desirable.
The old documentation claims that running VACUUM as anyone but the
superuser can't possibly work, which is not really true, because it
might be that some other user has enough permissions to VACUUM all
the tables that matter. Weaken the language just a bit.
The old documentation claims that you can't run any commands
when near XID exhaustion, which is false because you can still
run commands that don't require an XID, like a SELECT without a
locking clause.
The old documentation doesn't clearly explain that it's a good idea
to get rid of prepared transactons, long-running transactions, and
replication slots that are preventing (M)XID horizon advancement.
Spell out the steps to do that.
Also, discourage the use of VACUUM FULL and VACUUM FREEZE in
this type of scenario.
Alexander Alekseev, John Naylor, Robert Haas, reviewed at various
times by Peter Geoghegan, Hannu Krosing, and Andres Freund.
---
doc/src/sgml/maintenance.sgml | 112 +++++++++++++++++++++++++++++-----
1 file changed, 97 insertions(+), 15 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 3018d8f64c..66f2c6a02e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -660,29 +660,79 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
- hint; but note that the <command>VACUUM</command> must be performed by a
- superuser, else it will fail to process system catalogs and thus not
- be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ hint; but note that the <command>VACUUM</command> should be performed by a
+ superuser, else it will fail to process system catalogs, which prevent it from
+ being able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are ignored, the system will refuse to assign new XIDs once
+ there are fewer than three million transactions left until wraparound:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
+ In this condition any transactions already in progress can continue,
+ but only read-only transactions can be started. Operations that
+ modify database records or truncate relations will fail.
+ The <command>VACUUM</command> command can still be run normally.
+ Contrary to what the hint states, it is not necessary or desirable to stop the
+ postmaster or enter single user-mode in order to restore normal operation.
+ Instead, follow these steps:
+
+ <orderedlist>
+ <listitem>
+ <simpara>Resolve old prepared transactions. You can find these by checking
+ <link linkend="view-pg-prepared-xacts">pg_prepared_xacts</link> for rows where
+ <literal>age(transactionid)</literal> is large. Such transactions should be
+ committed or rolled back.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>End long-running open transactions. You can find these by checking
+ <link linkend="monitoring-pg-stat-activity-view">pg_stat_activity</link> for rows where
+ <literal>age(backend_xid)</literal> or <literal>age(backend_xmin)</literal> is
+ large. Such transactions should be committed or rolled back, or the session
+ can be terminated using <literal>pg_terminate_backend</literal>.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Drop any old replication slots. Use
+ <link linkend="monitoring-pg-stat-replication-view">pg_stat_replication</link> to
+ find slots where <literal>age(xmin)</literal> or <literal>age(catalog_xmin)</literal>
+ is large. In many cases, such slots were created for replication to servers that no
+ longer exist, or that have been down for a long time. If you drop a slot for a server
+ that still exists and might still try to connect to that slot, that replica may
+ need to be rebuilt.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Execute <command>VACUUM</command> in the target database. A database-wide
+ <literal>VACUUM</literal> is simplest; to reduce the time required, it as also possible
+ to issue manual <command>VACUUM</command> commands on the tables where
+ <structfield>relminxid</structfield> is oldest. Do not use <literal>VACUUM FULL</literal>
+ in this scenario, because it requires an XID and will therefore fail, except in super-user
+ mode, where it will instead consume an XID and thus increase the risk of transaction ID
+ wraparound. Do not use <literal>VACUUM FREEZE</literal> either, because it will do
+ more than the minimum amount of work required to restore normal operation.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Once normal operation is restored, ensure that autovacuum is properly configured
+ in the target database in order to avoid future problems.</simpara>
+ </listitem>
+ </orderedlist>
</para>
+ <note>
+ <para>
+ In earlier versions, it was sometimes necessary to stop the postmaster and
+ <command>VACUUM</command> the database in a single-user mode. In typical scenarios, this
+ is no longer necessary, and should be avoided whenever possible, since it involves taking
+ the system down. It is also riskier, since it disables transaction ID wraparound safeguards
+ that are designed to prevent data loss. The only reason to use single-user mode in this
+ scenario is if you wish to <command>TRUNCATE</command> or <command>DROP</command> unneeded
+ tables to avoid needing to <command>VACUUM</command> them. The three-million-transaction
+ safety margin exists to let the administrator do this. See the
+ <xref linkend="app-postgres"/> reference page for details about using single-user mode.
+ </para>
+ </note>
+
<sect3 id="vacuum-for-multixact-wraparound">
<title>Multixacts and Wraparound</title>
@@ -747,6 +797,38 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled.
</para>
+
+ <para>
+ Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
+ system will begin to emit warning messages when the database's oldest MXIDs reach forty
+ million transactions from the wraparound point. And, just as an the XID case, if these
+ warnings are ignored, the system will refuse to generate new MXIDs once there are fewer
+ than three million left until wraparound.
+ </para>
+
+ <para>
+ Normal operation when MXIDs are exhausted can be restored in much the same way as
+ when XIDs are exhausted. Follow the same steps in the previous section, but with the
+ following differences:
+
+ <orderedlist>
+ <listitem>
+ <simpara>Running transactions and prepared transactions can be ignored if there
+ is no chance that they might appear in a multixact.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>MXID information is not directly visible in system views such as
+ <literal>pg_stat_activity</literal>; however, looking for old XIDs is still a good
+ way of determining which transactions are causing MXID wraparound problems.</simpara>
+ </listitem>
+ <listitem>
+ <simpara>XID exhaustion will block all write transactions, but MXID exhaustion will
+ only block a subset of write transactions, specifically those that involve
+ row locks that require an MXID.</simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
</sect3>
</sect2>
--
2.37.1 (Apple Git-137.1)
On Thu, Oct 12, 2023 at 1:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 12, 2023 at 12:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
No objections from me.
Here is a doc-only patch that I think could be back-patched as far as
emergency mode exists. It combines all of the wording changes to the
documentation from v1-v3 of the previous version, but without changing
the message text that is quoted in the documentation, and without
adding more instances of similar message texts to the documentation,
and with a bunch of additional hacking by me.
It's a bit weird that we're effectively saying "pay no attention to
that terrible HINT"...but I get it. The simple fact is that the docs
were written in a way that allowed misinformation to catch on -- the
damage that needs to be undone isn't exactly limited to the docs
themselves.
Your choice to not backpatch the changes to the log messages makes a
lot more sense, now that I see that I see the wider context built by
this preparatory patch. Arguably, it would be counterproductive to
pretend that we didn't make this mistake on the backbranches. Better
to own the mistake.
Some things I changed:
- I made it so that the MXID section refers back to the XID section
instead of duplicating it, but with a short list of differences.
- I weakened the existing claim that says you must be a superuser or
VACUUM definitely won't fix it to say instead that you SHOULD run
VACUUM as the superuser, because the former is false and the latter is
true.
- I made the list of steps for recovering more explicit.
- I split out the bit about running autovacuum in the affected
database into a separate step to be performed after VACUUM for
continued good operation, rather than a necessary ingredient in
recovery, because it isn't.
- A bit of other minor rejiggering.
Those all make sense to me.
I'm not forgetting about the rest of the proposed patch set, or the
change I proposed earlier. I'm just posting this much now because this
is how far I got today, and it would be useful to get comments before
I go further. I think the residual portion of the patch set not
included in this documentation patch will be quite small, and I think
that's a good thing, but again, I don't intend to blow that off.
Of course. Your general approach seems wise.
Thanks for working on this. I will be relieved once this is finally
taken care of.
--
Peter Geoghegan
Hi,
Those all make sense to me.
[...]
Of course. Your general approach seems wise.
Thanks for working on this. I will be relieved once this is finally
taken care of.
+1, and many thanks for your attention to the patchset and all the details!
--
Best regards,
Aleksander Alekseev
On Fri, Oct 13, 2023 at 5:03 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
Thanks for working on this. I will be relieved once this is finally
taken care of.+1, and many thanks for your attention to the patchset and all the details!
Cool. I committed that and back-patched to v14, and here's the rest.
0001 makes the terminology change that I proposed earlier, and then
0002 is the remainder of what was in the previous patch set that
wasn't covered by what I committed already, with a few adjustments.
In particular, I preferred to stick with "avoid" rather than changing
to "prevent," and I thought it was clearer to refer to "failures"
plural rather than "failure" collective. These are arguable decisions,
though.
I propose to commit these changes only to master. I have included a
fairly long paragraph about that in the commit message for 0002.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v11-0002-Reword-messages-about-impending-M-XID-exhaustion.patchapplication/octet-stream; name=v11-0002-Reword-messages-about-impending-M-XID-exhaustion.patchDownload
From 94875d3213df0c60f74bc4ffce6ac13a9ab2350f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 16 Oct 2023 13:46:01 -0400
Subject: [PATCH v11 2/2] Reword messages about impending (M)XID exhaustion.
First, we shouldn't recommend switching to single-user mode, because
that's terrible advice. Especially on newer versions where VACUUM
will enter emergency mode when nearing (M)XID exhaustion, it's
perfectly fine to just VACUUM in multi-user mode. Doing it that way
is less disruptive and avoids disabling the safeguards that prevent
actual wraparound, so recommend that instead.
Second, be more precise about what is going to happen (when we're
nearing the limits) or what is happening (when we actually hit them).
The database doesn't shut down, nor does it refuse all commands. It
refuses commands that assign whichever of XIDs and MXIDs are nearly
exhausted.
No back-patch. The existing hint that advises going to single-user
mode is sufficiently awful advice that removing it or changing it
might be justifiable even though we normally avoid changing
user-facing messages in back-branches, but I (rhaas) felt that it
was better to be more conservative and limit this fix to master
only. Aside from the usual risk of breaking transalations, people
might be used to the existing message, or even have monitoring
scripts that look for it.
Alexander Alekseev, John Naylor, Robert Haas, reviewed at various
times by Peter Geoghegan, Hannu Krosing, and Andres Freund.
---
doc/src/sgml/maintenance.sgml | 11 ++++++-----
src/backend/access/transam/multixact.c | 4 ++--
src/backend/access/transam/varsup.c | 16 ++++++++--------
3 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 66f2c6a02e..9389eef973 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
-HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT: To avoid XID assignment failures, execute a database-wide VACUUM in that database.
</programlisting>
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
@@ -667,16 +667,17 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
there are fewer than three million transactions left until wraparound:
<programlisting>
-ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
-HINT: Stop the postmaster and vacuum that database in single-user mode.
+ERROR: database is not accepting commands that assign new XIDs to avoid wraparound data loss in database "mydb"
+HINT: Execute a database-wide VACUUM in that database.
</programlisting>
In this condition any transactions already in progress can continue,
but only read-only transactions can be started. Operations that
modify database records or truncate relations will fail.
The <command>VACUUM</command> command can still be run normally.
- Contrary to what the hint states, it is not necessary or desirable to stop the
- postmaster or enter single user-mode in order to restore normal operation.
+ Note that, contrary to what was sometimes recommended in earlier releases,
+ it is not necessary or desirable to stop the postmaster or enter single
+ user-mode in order to restore normal operation.
Instead, follow these steps:
<orderedlist>
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 546d411610..57ed34c0a8 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datname,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To avoid MultiXactId assignment failures, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
multiWrapLimit - curMulti,
oldest_datoid,
multiWrapLimit - curMulti),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To avoid MultiXactId assignment failures, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..abfee48317 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,16 +126,16 @@ GetNewTransactionId(bool isSubXact)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that assign new XIDs to avoid wraparound data loss in database \"%s\"",
oldest_datname),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that assign new XIDs to avoid wraparound data loss in database with OID %u",
oldest_datoid),
- errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+ errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To avoid XID assignment failures, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To avoid XID assignment failures, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To avoid XID assignment failures, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - curXid),
- errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+ errhint("To avoid XID assignment failures, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
}
--
2.37.1 (Apple Git-137.1)
v11-0001-Talk-about-assigning-rather-than-generating-new-.patchapplication/octet-stream; name=v11-0001-Talk-about-assigning-rather-than-generating-new-.patchDownload
From 569238c86ebb97ce96ae12e47c84e7ce7f81c68b Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 12 Oct 2023 14:50:15 -0400
Subject: [PATCH v11 1/2] Talk about assigning, rather than generating, new
MultiXactIds.
The word "assign" is used in various places internally to describe what
GetNewMultiXactId does, but the user-facing messages have previously
said "generate". For consistency, standardize on "assign," which seems
(at least to me) to be slightly clearer.
---
src/backend/access/transam/multixact.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index abb022e067..546d411610 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1020,14 +1020,14 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database \"%s\"",
+ errmsg("database is not accepting commands that assign new MultiXactIds to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database with OID %u",
+ errmsg("database is not accepting commands that assign new MultiXactIds to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
--
2.37.1 (Apple Git-137.1)
On Mon, Oct 16, 2023 at 11:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
I propose to commit these changes only to master. I have included a
fairly long paragraph about that in the commit message for 0002.
LGTM, except for one small detail: I noticed that you misspelled
"translations" in the commit message.
Thanks for getting this over the line
--
Peter Geoghegan
On Mon, Oct 16, 2023 at 3:46 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Oct 16, 2023 at 11:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
I propose to commit these changes only to master. I have included a
fairly long paragraph about that in the commit message for 0002.LGTM, except for one small detail: I noticed that you misspelled
"translations" in the commit message.
Oops. Fixed locally.
Thanks for getting this over the line
Sure thing. I'm glad we're finally doing something about it.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
LGTM, except for one small detail: I noticed that you misspelled
"translations" in the commit message.Oops. Fixed locally.
v11-0001 and v11-0002 LGTM too. IMO "to assign a XID" sounds better
than "to generate a XID".
--
Best regards,
Aleksander Alekseev
On Tue, Oct 17, 2023 at 4:57 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
v11-0001 and v11-0002 LGTM too.
Cool. Seems we are all in agreement, so committed these. Thanks!
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Oct 17, 2023 at 9:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
Cool. Seems we are all in agreement, so committed these. Thanks!
Thank you for getting this across the finish line!
Hello Robert,
17.10.2023 17:39, Robert Haas wrote:
On Tue, Oct 17, 2023 at 4:57 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:v11-0001 and v11-0002 LGTM too.
Cool. Seems we are all in agreement, so committed these. Thanks!
Please look at the following sentence added by the commit:
...
to issue manual <command>VACUUM</command> commands on the tables where
<structfield>relminxid</structfield> is oldest.
Isn't relminxid a typo there?
Best regards,
Alexander