How to stop autovacuum silently
Hi!
Recently, one of our customers had reported a not working autovacuum.
After a minor investigation, I've found that
autovacuum launcher did, actually, run vacuum as expected, but with no
results. At the same time, no warnings or
other anomies were present in the logs.
At first, I've thought may be statistics is broken, thus vacuum is not
working as expected. But in fact, something
more interesting is had happened.
The pg_class.relfrozenxid was set to some rubbish value from the future,
thus broken in template1 DB, so any new
database will have it's broken too. Then, we create "blocker" DB and then
in vac_update_datfrozenxid() we get "bogus" (from the future) value
of relfrozenxid and *silently* return. Any other new created DB will not
be autovacuumed.
Funny, but from the perspective of DBA, this looks like autovacuum is not
working any more for no reasons, although
all the criterion for its launch is clearly observed.
AFAICS, there are several solutions for this state:
- run vacuumdb for all DB's
- manually update broken pg_class.relfrozenxid
- lowering of autovacuum_freeze_max_age to trigger prevent of transaction
ID wraparound
I do understand, this behaviour hardly can be described as a bug of some
sort, but could we make, at least, a useful
message to help to clarify what is going on here?
=== REPRODUCE ===
$ cat <<EOF >> pgsql/data/postgresql.conf
autovacuum_naptime = 1s
autovacuum_freeze_max_age = 100000
EOF
$ ./pgsql/bin/pg_ctl -D pgsql/data -l pgsql/logfile start
waiting for server to start.... done
server started
$ ./pgsql/bin/psql postgres
psql (17devel)
Type "help" for help.
postgres=# \c template1
You are now connected to database "template1" as user "orlov".
template1=# update pg_class set relfrozenxid='200000' where oid = 1262;
UPDATE 1
template1=# do $$
begin
while 120000 - txid_current()::text::int8 > 0 loop
commit;
end loop;
end $$;
DO
template1=# create database blocker;
CREATE DATABASE
template1=# create database foo;
CREATE DATABASE
template1=# \c foo
You are now connected to database "foo" as user "orlov".
foo=# create table bar(baz int);
CREATE TABLE
foo=# insert into bar select bar from generate_series(1, 8192) bar;
INSERT 0 8192
foo=# update bar set baz=baz;
UPDATE 8192
foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count
from
pg_stat_user_tables where relname = 'bar';
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum | autovacuum_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+------------------
bar | 8192 | 8192 | 0 | 8192 | 8192 |
| | 0
(1 row)
foo=# update bar set baz=baz;
UPDATE 8192
foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count
from
pg_stat_user_tables where relname = 'bar';
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum | autovacuum_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+------------------
bar | 8192 | 16384 | 0 | 8192 | 16384 |
| | 0
(1 row)
... and so on
--
Best regards,
Maxim Orlov.
Attachments:
0001-Add-warning-if-datfrozenxid-or-datminmxid-is-not-set.patchapplication/octet-stream; name=0001-Add-warning-if-datfrozenxid-or-datminmxid-is-not-set.patchDownload
From 555f6037f5caa09d95767d40d966e677f2a3fa71 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <orlovmg@gmail.com>
Date: Wed, 22 Nov 2023 19:14:03 +0300
Subject: [PATCH] Add warning if datfrozenxid or datminmxid is not set due to
bogus relfrozenxid or relminmxid.
Author: Maxim Orlov <orlovmg@gmail.com>
---
src/backend/commands/vacuum.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8bdbee6841..00712e7adb 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1660,6 +1660,13 @@ vac_update_datfrozenxid(void)
if (TransactionIdPrecedes(lastSaneFrozenXid, classForm->relfrozenxid))
{
bogus = true;
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("bogus relfrozenxid %llu for relation \"%s\"",
+ (unsigned long long) classForm->relfrozenxid,
+ NameStr(classForm->relname)),
+ errhint("Relfrozenxid seem to be in the future, last sane xact is %llu.",
+ (unsigned long long) lastSaneFrozenXid)));
break;
}
@@ -1674,6 +1681,13 @@ vac_update_datfrozenxid(void)
if (MultiXactIdPrecedes(lastSaneMinMulti, classForm->relminmxid))
{
bogus = true;
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("bogus relminmxid %llu for relation \"%s\"",
+ (unsigned long long) classForm->relminmxid,
+ NameStr(classForm->relname)),
+ errhint("Relminmxid seem to be in the future, last sane multixact is %llu. Consider running vacuumdb.",
+ (unsigned long long) lastSaneMinMulti)));
break;
}
--
2.42.0
On Wed, Nov 22, 2023 at 8:18 AM Maxim Orlov <orlovmg@gmail.com> wrote:
Recently, one of our customers had reported a not working autovacuum. After a minor investigation, I've found that
autovacuum launcher did, actually, run vacuum as expected, but with no results. At the same time, no warnings or
other anomies were present in the logs.
Are you aware of commit e83ebfe6d7, which added a similar WARNING at
the point when VACUUM overwrites a relfrozenxid/relminmxid "from the
future"? It's a recent one.
At first, I've thought may be statistics is broken, thus vacuum is not working as expected. But in fact, something
more interesting is had happened.
Was pg_upgrade even run against this database? My guess is that the
underlying problem was caused by the bug fixed by commit 74cf7d46.
--
Peter Geoghegan
On Wed, 22 Nov 2023 at 21:13, Peter Geoghegan <pg@bowt.ie> wrote:
Are you aware of commit e83ebfe6d7, which added a similar WARNING at
the point when VACUUM overwrites a relfrozenxid/relminmxid "from the
future"? It's a recent one.
Thank you for reply! I hadn't noticed it. But in described above case, it
doesn't
produce any warnings. My concern here is that with a couple of updates, we
can
stop autovacuum implicitly without any warnings.
Was pg_upgrade even run against this database? My guess is that the
underlying problem was caused by the bug fixed by commit 74cf7d46.
I'm pretty much sure it was, but, unfortunately, there are no way to 100%
confirm
this. All I know, they're using PG13 now.
--
Best regards,
Maxim Orlov.