DROP DATABASE is interruptible
Hi,
Unfortunately DROP DATABASE does not hold interrupt over its crucial steps. If
you e.g. set a breakpoint on DropDatabaseBuffers() and then do a signal
SIGINT, we'll process that interrupt before the transaction commits.
A later connect to that database ends with:
2023-03-14 10:22:24.443 PDT [3439153][client backend][3/2:0][[unknown]] PANIC: could not open critical system index 2662
It's not entirely obvious how to fix this. We can't just hold interrupts for
the whole transaction - for one, we hang if we do so, because it prevents
ourselves from absorbing our own barrier:
/* Close all smgr fds in all backends. */
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
ISTM that at the very least dropdb() needs to internally commit *before*
dropping buffers - after that point the database is corrupt.
Greetings,
Andres Freund
I tried out the patch you posted over at [1]/messages/by-id/20230509013255.fjrlpitnj3ltur76@awork3.anarazel.de. For those wanting an
easy way to test it, or test the buggy behaviour in master without
this patch, you can simply kill -STOP the checkpointer, so that DROP
DATABASE hangs in RequestCheckpoint() (or you could SIGSTOP any other
backend so it hangs in the barrier thing instead), and then you can
just press ^C like this:
postgres=# create database db2;
CREATE DATABASE
postgres=# drop database db2;
^CCancel request sent
ERROR: canceling statement due to user request
After that you get:
$ psql db2
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432"
failed: FATAL: database "db2" is invalid
DETAIL: Use DROP DATABASE to drop invalid databases
I suppose it should be a HINT?
+# FIXME: It'd be good to test the actual interruption path. But it's not
+# immediately obvious how.
I wonder if there is some way to incorporate something based on
SIGSTOP signals into the test, but I don't know how to do it on
Windows and maybe that's a bit weird anyway. For a non-OS-specific
way to do it, I was wondering about having a test module function that
has a wait loop that accepts ^C but deliberately ignores
ProcSignalBarrier, and leaving that running in a background psql for a
similar effect?
Not sure why the test is under src/test/recovery.
While a database exists in this state, we get periodic autovacuum
noise, which I guess we should actually skip? I suppose someone might
eventually wonder if autovacuum could complete the drop, but it seems
a bit of a sudden weird leap in duties and might be confusing (perhaps
it'd make more sense if 'invalid because creating' and 'invalid
because dropping' were distinguished).
2023-05-09 15:24:10.860 NZST [523191] FATAL: database "db2" is invalid
2023-05-09 15:24:10.860 NZST [523191] DETAIL: Use DROP DATABASE to
drop invalid databases
2023-05-09 15:25:10.883 NZST [523279] FATAL: database "db2" is invalid
2023-05-09 15:25:10.883 NZST [523279] DETAIL: Use DROP DATABASE to
drop invalid databases
2023-05-09 15:26:10.899 NZST [523361] FATAL: database "db2" is invalid
2023-05-09 15:26:10.899 NZST [523361] DETAIL: Use DROP DATABASE to
drop invalid databases
2023-05-09 15:27:10.919 NZST [523408] FATAL: database "db2" is invalid
2023-05-09 15:27:10.919 NZST [523408] DETAIL: Use DROP DATABASE to
drop invalid databases
2023-05-09 15:28:10.938 NZST [523456] FATAL: database "db2" is invalid
2023-05-09 15:28:10.938 NZST [523456] DETAIL: Use DROP DATABASE to
drop invalid databases
[1]: /messages/by-id/20230509013255.fjrlpitnj3ltur76@awork3.anarazel.de
On Tue, May 9, 2023 at 3:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I tried out the patch you posted over at [1].
I forgot to add, +1, I think this is a good approach.
(I'm still a little embarrassed at how long we spent trying to debug
this in the other thread from the supplied clues, when you'd already
pointed this failure mechanism out including the exact error message a
couple of months ago. One thing I've noticed is that new threads
posted in the middle of commitfests are hard to see :-D We were
getting pretty close, though.)
Hi,
On 2023-05-09 15:41:36 +1200, Thomas Munro wrote:
I tried out the patch you posted over at [1].
Thanks!
$ psql db2
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432"
failed: FATAL: database "db2" is invalid
DETAIL: Use DROP DATABASE to drop invalid databasesI suppose it should be a HINT?
Yup.
+# FIXME: It'd be good to test the actual interruption path. But it's not +# immediately obvious how.I wonder if there is some way to incorporate something based on
SIGSTOP signals into the test, but I don't know how to do it on
Windows and maybe that's a bit weird anyway. For a non-OS-specific
way to do it, I was wondering about having a test module function that
has a wait loop that accepts ^C but deliberately ignores
ProcSignalBarrier, and leaving that running in a background psql for a
similar effect?
Seems a bit too complicated.
We really need to work at a framework for this kind of thing.
Not sure why the test is under src/test/recovery.
Where else? We don't really have a place to put backend specific tests that
aren't about logical replication or recovery right now...
It partially is about dealing with crashes etc in the middle of DROP DATABASE,
so it doesn't seem unreasonble to me anyway.
While a database exists in this state, we get periodic autovacuum
noise, which I guess we should actually skip?
Yes, good catch.
Also should either reset datfrozenxid et al when invalidating, or ignore it
when computing horizons.
I suppose someone might eventually wonder if autovacuum could complete the
drop, but it seems a bit of a sudden weird leap in duties and might be
confusing (perhaps it'd make more sense if 'invalid because creating' and
'invalid because dropping' were distinguished).
I'm bit hesitant to do so for now. Once it's a bit more settled, maybe?
Although I wonder if there's something better suited to the task than
autovacuum.
Greetings,
Andres Freund
Hi,
I'm hacking on this bugfix again, thanks to Evgeny's reminder on the other
thread [1]/messages/by-id/01020188d31d0a86-16af92c0-4466-4cb6-a2e8-0e5898aab800-000000@eu-west-1.amazonses.com.
I've been adding checks for partiall-dropped databases to the following places
so far:
- vac_truncate_clog(), as autovacuum can't process it anymore. Otherwise a
partially dropped database could easily lead to shutdown-due-to-wraparound.
- get_database_list() - so autovacuum workers don't error out when connecting
- template database used by CREATE DATABASE
- pg_dumpall, so we don't try to connect to the database
- vacuumdb, clusterdb, reindexdb, same
It's somewhat annoying that there is no shared place for the relevant query
for the client-side cases.
I haven't yet added checks to pg_upgrade, even though that's clearly
needed. I'm waffling a bit between erroring out and just ignoring the
database? pg_upgrade already fails when datallowconn is set "wrongly", see
check_proper_datallowconn(). Any opinions?
I'm not sure what should be done for psql. It's probably not a good idea to
change tab completion, that'd just make it appear the database is gone. But \l
could probably show dropped databases more prominently?
We don't really have a good place to for database specific
code. dbcommands.[ch] are for commands (duh), but already contain a bunch of
functions that don't really belong there. Seems we should add a
catalog/pg_database.c or catalog/database.c (tbh, I don't really know which we
use for what). But that's probably for HEAD only.
dbcommands.c's get_db_info() seems to have gone completely off the deep
end. It returns information in 14 separate out parameters, and the variables
for that need to all exhaustively be declared. And of course that differs
heavily between releases, making it a pain to backpatch any change. ISTM we
should just define a struct for the parameters - alternatively we could just
return a copy of the pg_database tuple, but it looks like the variable-width
attributes would make that *just* about a loss.
I guess that's once more something better dealt with on HEAD, but damn, I'm
not relishing having to deal with backpatching anything touching it - I think
it might be reasonable to just open-code fetching datconnlimit :/.
This patch is starting to be a bit big, particularly once adding tests for all
the checks mentioned above - but I haven't heard of or thought of a better
proposal :(.
Greetings,
Andres Freund
[1]: /messages/by-id/01020188d31d0a86-16af92c0-4466-4cb6-a2e8-0e5898aab800-000000@eu-west-1.amazonses.com
Hi,
On 2023-05-09 15:41:36 +1200, Thomas Munro wrote:
+# FIXME: It'd be good to test the actual interruption path. But it's not +# immediately obvious how.I wonder if there is some way to incorporate something based on
SIGSTOP signals into the test, but I don't know how to do it on
Windows and maybe that's a bit weird anyway. For a non-OS-specific
way to do it, I was wondering about having a test module function that
has a wait loop that accepts ^C but deliberately ignores
ProcSignalBarrier, and leaving that running in a background psql for a
similar effect?
I found a way to test it reliably, albeit partially. However, I'm not sure
where to do so / if it's worth doing so.
The problem occurs once remove_dbtablespaces() starts working. The fix does a
heap_inplace_update() before that. So to reproduce the problem one session can
lock pg_tablespace, another can drop a database. Then the second session can
be cancelled by the first.
Waiting for locks to be acquired etc is somewhat cumbersome in a tap
test. It'd be easier in an isolation test. But I don't think we want to do
this as part of the normal isolation schedule?
So just open coding it in a tap test seems to be the best way?
Is it worth doing?
Greetings,
Andres Freund
Hi,
On 2023-06-21 12:02:04 -0700, Andres Freund wrote:
I'm hacking on this bugfix again, thanks to Evgeny's reminder on the other
thread [1].I've been adding checks for partiall-dropped databases to the following places
so far:
- vac_truncate_clog(), as autovacuum can't process it anymore. Otherwise a
partially dropped database could easily lead to shutdown-due-to-wraparound.
- get_database_list() - so autovacuum workers don't error out when connecting
- template database used by CREATE DATABASE
- pg_dumpall, so we don't try to connect to the database
- vacuumdb, clusterdb, reindexdb, same
Also pg_amcheck.
It's somewhat annoying that there is no shared place for the relevant query
for the client-side cases.
Still the case, I looked around, and it doesn't look we do anything smart
anywhere :/
I haven't yet added checks to pg_upgrade, even though that's clearly
needed. I'm waffling a bit between erroring out and just ignoring the
database? pg_upgrade already fails when datallowconn is set "wrongly", see
check_proper_datallowconn(). Any opinions?
There don't need to be explict checks, because pg_upgrade will fail, because
it connects to every database. Obviously the error could be nicer, but it
seems ok for something hopefully very rare. I did add a test ensuring that the
behaviour is caught.
It's somewhat odd that pg_upgrade prints errors on stdout...
I'm not sure what should be done for psql. It's probably not a good idea to
change tab completion, that'd just make it appear the database is gone. But \l
could probably show dropped databases more prominently?
I have not done that. I wonder if this is something that should be done in the
back branches?
Greetings,
Andres Freund
Attachments:
v2-0001-Add-missing-lock-releases-to-vac_truncate_clog.patchtext/x-diff; charset=us-asciiDownload
From 2c8baf448fddacd14c478da0abe30aa45391dff9 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 22 Jun 2023 17:27:54 -0700
Subject: [PATCH v2 1/2] Add missing lock releases to vac_truncate_clog()
---
src/backend/commands/vacuum.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index bb79de4da6a..984c98a5df1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1902,12 +1902,16 @@ vac_truncate_clog(TransactionId frozenXID,
ereport(WARNING,
(errmsg("some databases have not been vacuumed in over 2 billion transactions"),
errdetail("You might have already suffered transaction-wraparound data loss.")));
+ LWLockRelease(WrapLimitsVacuumLock);
return;
}
/* chicken out if data is bogus in any other way */
if (bogus)
+ {
+ LWLockRelease(WrapLimitsVacuumLock);
return;
+ }
/*
* Advance the oldest value for commit timestamps before truncating, so
--
2.38.0
v2-0002-Handle-interrupted-DROP-DATABASE.patchtext/x-diff; charset=us-asciiDownload
From e1c67f6a1c028ae111dacb0867b8bc6a478678ab Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 8 May 2023 18:28:33 -0700
Subject: [PATCH v2 2/2] Handle interrupted DROP DATABASE
Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Backpatch:
---
src/include/catalog/pg_database.h | 10 +-
src/backend/commands/dbcommands.c | 102 ++++++++++++---
src/backend/commands/vacuum.c | 10 ++
src/backend/postmaster/autovacuum.c | 7 ++
src/backend/utils/init/postinit.c | 10 ++
src/bin/pg_amcheck/pg_amcheck.c | 2 +-
src/bin/pg_amcheck/t/002_nonesuch.pl | 34 +++++
src/bin/pg_dump/pg_dumpall.c | 4 +-
src/bin/pg_dump/t/002_pg_dump.pl | 17 +++
src/bin/scripts/clusterdb.c | 2 +-
src/bin/scripts/reindexdb.c | 2 +-
src/bin/scripts/t/011_clusterdb_all.pl | 14 +++
src/bin/scripts/t/050_dropdb.pl | 9 ++
src/bin/scripts/t/091_reindexdb_all.pl | 14 +++
src/bin/scripts/t/101_vacuumdb_all.pl | 14 +++
src/bin/scripts/vacuumdb.c | 2 +-
src/test/recovery/meson.build | 1 +
src/test/recovery/t/037_invalid_database.pl | 130 ++++++++++++++++++++
18 files changed, 361 insertions(+), 23 deletions(-)
create mode 100644 src/test/recovery/t/037_invalid_database.pl
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index d004f4dc8aa..93206bafaca 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -49,7 +49,11 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
/* new connections allowed? */
bool datallowconn;
- /* max connections allowed (-1=no limit) */
+ /*
+ * Max connections allowed (-1=no limit, -2=invalid database). A database
+ * is set to invalid partway through eing dropped. Using datconnlimit=-2
+ * for this purpose isn't particularly clean, but is backpatchable.
+ */
int32 datconnlimit;
/* all Xids < this are frozen in this DB */
@@ -103,4 +107,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_database_oid_index, 2672, DatabaseOidIndexId, on pg
DECLARE_OID_DEFINING_MACRO(Template0DbOid, 4);
DECLARE_OID_DEFINING_MACRO(PostgresDbOid, 5);
+
+extern bool database_is_invalid_form(Form_pg_database datform);
+extern bool database_is_invalid_oid(Oid dboid);
+
#endif /* PG_DATABASE_H */
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 09f1ab41ad3..a9f00403955 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -977,6 +977,16 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
errmsg("template database \"%s\" does not exist",
dbtemplate)));
+ /*
+ * If the source database was in the process of being dropped, we can't
+ * use it as a template.
+ */
+ if (database_is_invalid_oid(src_dboid))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use invalid database \"%s\" as template", dbtemplate),
+ errhint("Use DROP DATABASE to drop invalid databases"));
+
/*
* Permission check: to copy a DB that's not marked datistemplate, you
* must be superuser or the owner thereof.
@@ -1576,6 +1586,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
bool db_istemplate;
Relation pgdbrel;
HeapTuple tup;
+ Form_pg_database datform;
int notherbackends;
int npreparedxacts;
int nslots,
@@ -1691,17 +1702,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
dbname),
errdetail_busy_db(notherbackends, npreparedxacts)));
- /*
- * Remove the database's tuple from pg_database.
- */
- tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(db_id));
- if (!HeapTupleIsValid(tup))
- elog(ERROR, "cache lookup failed for database %u", db_id);
-
- CatalogTupleDelete(pgdbrel, &tup->t_self);
-
- ReleaseSysCache(tup);
-
/*
* Delete any comments or security labels associated with the database.
*/
@@ -1718,6 +1718,37 @@ dropdb(const char *dbname, bool missing_ok, bool force)
*/
dropDatabaseDependencies(db_id);
+ /*
+ * Tell the cumulative stats system to forget it immediately, too.
+ */
+ pgstat_drop_database(db_id);
+
+ tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for database %u", db_id);
+ datform = (Form_pg_database) GETSTRUCT(tup);
+
+ /*
+ * Except for the deletion of the catalog row, subsequent actions are not
+ * transactional (consider DropDatabaseBuffers() discarding modified
+ * buffers). But we might crash or get interrupted below. To prevent
+ * accesses to a database with invalid contents, mark the database as
+ * invalid using an in-place update.
+ *
+ * We need to flush the WAL before continuing, to guarantee the
+ * modification is durable before performing irreversible filesystem
+ * operations.
+ */
+ datform->datconnlimit = -2;
+ heap_inplace_update(pgdbrel, tup);
+ XLogFlush(XactLastRecEnd);
+
+ /*
+ * Also delete the tuple - transactionally. If this transaction commits,
+ * the row will be gone, but if we fail, dropdb() can be invoked again.
+ */
+ CatalogTupleDelete(pgdbrel, &tup->t_self);
+
/*
* Drop db-specific replication slots.
*/
@@ -1730,11 +1761,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
*/
DropDatabaseBuffers(db_id);
- /*
- * Tell the cumulative stats system to forget it immediately, too.
- */
- pgstat_drop_database(db_id);
-
/*
* Tell checkpointer to forget any pending fsync and unlink requests for
* files in the database; else the fsyncs will fail at next checkpoint, or
@@ -2346,6 +2372,14 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
datform = (Form_pg_database) GETSTRUCT(tuple);
dboid = datform->oid;
+ if (datform->datconnlimit == -2)
+ {
+ ereport(FATAL,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter invalid database \"%s\"", stmt->dbname),
+ errdetail("Use DROP DATABASE to drop invalid databases"));
+ }
+
if (!object_ownercheck(DatabaseRelationId, dboid, GetUserId()))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
stmt->dbname);
@@ -3064,6 +3098,42 @@ get_database_name(Oid dbid)
return result;
}
+
+/*
+ * While dropping a database the pg_database row is marked invalid, but the
+ * catalog contents still exist. Connections to such a database are not
+ * allowed.
+ */
+bool
+database_is_invalid_form(Form_pg_database datform)
+{
+ return datform->datconnlimit == -2;
+}
+
+
+/*
+ * Convenience wrapper around database_is_invalid_form()
+ */
+bool
+database_is_invalid_oid(Oid dboid)
+{
+ HeapTuple dbtup;
+ Form_pg_database dbform;
+ bool invalid;
+
+ dbtup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dboid));
+ if (!HeapTupleIsValid(dbtup))
+ elog(ERROR, "cache lookup failed for database %u", dboid);
+ dbform = (Form_pg_database) GETSTRUCT(dbtup);
+
+ invalid = database_is_invalid_form(dbform);
+
+ ReleaseSysCache(dbtup);
+
+ return invalid;
+}
+
+
/*
* recovery_create_dbdir()
*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 984c98a5df1..636c4cd56c7 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1859,6 +1859,16 @@ vac_truncate_clog(TransactionId frozenXID,
Assert(TransactionIdIsNormal(datfrozenxid));
Assert(MultiXactIdIsValid(datminmxid));
+ /*
+ * If database is in the process of getting dropped, or has been
+ * interrupted while doing so, no connections to it are possible
+ * anymore. Therefore we don't need to take it into account
+ * here. Which is good, because it can't be processed by autovacuum
+ * either.
+ */
+ if (database_is_invalid_form((Form_pg_database) dbform))
+ continue;
+
/*
* If things are working properly, no database should have a
* datfrozenxid or datminmxid that is "in the future". However, such
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f929b62e8ad..9a6291987d7 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1972,6 +1972,13 @@ get_database_list(void)
avw_dbase *avdb;
MemoryContext oldcxt;
+ /*
+ * If database has partially been dropped, we can't, nor need to,
+ * vacuum it.
+ */
+ if (database_is_invalid_form(pgdatabase))
+ continue;
+
/*
* Allocate our results in the caller's context, not the
* transaction's. We do this inside the loop, and restore the original
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 561bd13ed24..ec48bf91044 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -1116,6 +1116,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
if (!bootstrap)
{
HeapTuple tuple;
+ Form_pg_database datform;
tuple = GetDatabaseTuple(dbname);
if (!HeapTupleIsValid(tuple) ||
@@ -1125,6 +1126,15 @@ InitPostgres(const char *in_dbname, Oid dboid,
(errcode(ERRCODE_UNDEFINED_DATABASE),
errmsg("database \"%s\" does not exist", dbname),
errdetail("It seems to have just been dropped or renamed.")));
+
+ datform = (Form_pg_database) GETSTRUCT(tuple);
+ if (database_is_invalid_form(datform))
+ {
+ ereport(FATAL,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot connect to invalid database \"%s\"", dbname),
+ errhint("Use DROP DATABASE to drop invalid databases"));
+ }
}
/*
diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 68f8180c19f..57df14bc1e0 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -1590,7 +1590,7 @@ compile_database_list(PGconn *conn, SimplePtrList *databases,
"FROM pg_catalog.pg_database d "
"LEFT OUTER JOIN exclude_raw e "
"ON d.datname ~ e.rgx "
- "\nWHERE d.datallowconn "
+ "\nWHERE d.datallowconn AND datconnlimit != -2 "
"AND e.pattern_id IS NULL"
"),"
diff --git a/src/bin/pg_amcheck/t/002_nonesuch.pl b/src/bin/pg_amcheck/t/002_nonesuch.pl
index cf2438717e1..c7d6c2426e6 100644
--- a/src/bin/pg_amcheck/t/002_nonesuch.pl
+++ b/src/bin/pg_amcheck/t/002_nonesuch.pl
@@ -291,6 +291,40 @@ $node->command_checks_all(
'many unmatched patterns and one matched pattern under --no-strict-names'
);
+
+#########################################
+# Test that an invalid / partially dropped database won't be targeted
+
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+
+$node->command_checks_all(
+ [
+ 'pg_amcheck', '-d', 'invalid'
+ ],
+ 1,
+ [qr/^$/],
+ [
+ qr/pg_amcheck: error: no connectable databases to check matching "invalid"/,
+ ],
+ 'checking handling of invalid database');
+
+$node->command_checks_all(
+ [
+ 'pg_amcheck', '-d', 'postgres',
+ '-t', 'invalid.public.foo',
+ ],
+ 1,
+ [qr/^$/],
+ [
+ qr/pg_amcheck: error: no connectable databases to check matching "invalid.public.foo"/,
+ ],
+ 'checking handling of object in invalid database');
+
+
#########################################
# Test checking otherwise existent objects but in databases where they do not exist
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 3627b69e2a6..2cad796d3f5 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1345,7 +1345,7 @@ dropDBs(PGconn *conn)
res = executeQuery(conn,
"SELECT datname "
"FROM pg_database d "
- "WHERE datallowconn "
+ "WHERE datallowconn AND datconnlimit != -2 "
"ORDER BY datname");
if (PQntuples(res) > 0)
@@ -1488,7 +1488,7 @@ dumpDatabases(PGconn *conn)
res = executeQuery(conn,
"SELECT datname "
"FROM pg_database d "
- "WHERE datallowconn "
+ "WHERE datallowconn AND datconnlimit != -2 "
"ORDER BY (datname <> 'template1'), datname");
if (PQntuples(res) > 0)
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 63bb4689d44..ea9cb433b1d 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -1907,6 +1907,15 @@ my %tests = (
},
},
+ 'CREATE DATABASE invalid...' => {
+ create_order => 1,
+ create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid'),
+ regexp => qr/^CREATE DATABASE invalid/m,
+ not_like => {
+ pg_dumpall_dbprivs => 1,
+ },
+ },
+
'CREATE ACCESS METHOD gist2' => {
create_order => 52,
create_sql =>
@@ -4690,6 +4699,14 @@ command_fails_like(
qr/pg_dump: error: connection to server .* failed: FATAL: database "qqq" does not exist/,
'connecting to a non-existent database');
+#########################################
+# Test connecting to an invalid database
+
+$node->command_fails_like(
+ [ 'pg_dump', '-d', 'invalid' ],
+ qr/pg_dump: error: connection to server .* failed: FATAL: cannot connect to invalid database "invalid"/,
+ 'connecting to an invalid database');
+
#########################################
# Test connecting with an unprivileged user
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index e3585b3272e..3fea3a8e268 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -234,7 +234,7 @@ cluster_all_databases(ConnParams *cparams, const char *progname,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", echo);
+ result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;", echo);
PQfinish(conn);
for (i = 0; i < PQntuples(result); i++)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 3e8f6aca40e..5e5b075d563 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -718,7 +718,7 @@ reindex_all_databases(ConnParams *cparams,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", echo);
+ result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;", echo);
PQfinish(conn);
for (i = 0; i < PQntuples(result); i++)
diff --git a/src/bin/scripts/t/011_clusterdb_all.pl b/src/bin/scripts/t/011_clusterdb_all.pl
index eb904c08c70..2abfd1c0a97 100644
--- a/src/bin/scripts/t/011_clusterdb_all.pl
+++ b/src/bin/scripts/t/011_clusterdb_all.pl
@@ -21,4 +21,18 @@ $node->issues_sql_like(
qr/statement: CLUSTER.*statement: CLUSTER/s,
'cluster all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'clusterdb', '-a' ],
+ 'invalid database not targeted by clusterdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 010_clusterdb.pl as well.
+$node->command_fails_like([ 'clusterdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'clusterdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/t/050_dropdb.pl b/src/bin/scripts/t/050_dropdb.pl
index 9f2b6463b82..8a55179114e 100644
--- a/src/bin/scripts/t/050_dropdb.pl
+++ b/src/bin/scripts/t/050_dropdb.pl
@@ -31,4 +31,13 @@ $node->issues_sql_like(
$node->command_fails([ 'dropdb', 'nonexistent' ],
'fails with nonexistent database');
+# check that invalid database can be dropped with dropdb
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'dropdb', 'invalid' ],
+ 'invalid database can be dropped');
+
done_testing();
diff --git a/src/bin/scripts/t/091_reindexdb_all.pl b/src/bin/scripts/t/091_reindexdb_all.pl
index ac62b9b5585..4363e0c900a 100644
--- a/src/bin/scripts/t/091_reindexdb_all.pl
+++ b/src/bin/scripts/t/091_reindexdb_all.pl
@@ -18,4 +18,18 @@ $node->issues_sql_like(
qr/statement: REINDEX.*statement: REINDEX/s,
'reindex all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'reindexdb', '-a' ],
+ 'invalid database not targeted by reindexdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 090_reindexdb.pl as well.
+$node->command_fails_like([ 'reindexdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'reindexdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 0f9d5adc48b..e49c3cadc62 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -16,4 +16,18 @@ $node->issues_sql_like(
qr/statement: VACUUM.*statement: VACUUM/s,
'vacuum all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'vacuumdb', '-a' ],
+ 'invalid database not targeted by vacuumdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 010_vacuumdb.pl as well.
+$node->command_fails_like([ 'vacuumdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'vacuumdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4b17a070890..f03d5b1c6cb 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -883,7 +883,7 @@ vacuum_all_databases(ConnParams *cparams,
conn = connectMaintenanceDatabase(cparams, progname, echo);
result = executeQuery(conn,
- "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;",
+ "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
echo);
PQfinish(conn);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 20089580100..e7328e48944 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -42,6 +42,7 @@ tests += {
't/034_create_database.pl',
't/035_standby_logical_decoding.pl',
't/036_truncated_dropped.pl',
+ 't/037_invalid_database.pl',
],
},
}
diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl
new file mode 100644
index 00000000000..69a3531afc4
--- /dev/null
+++ b/src/test/recovery/t/037_invalid_database.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+#
+# Test we handle interrupted DROP DATABASE correctly.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->append_conf(
+ "postgresql.conf", qq(
+autovacuum = off
+max_prepared_transactions=5
+log_min_duration_statement=0
+log_connections=on
+log_disconnections=on
+));
+
+$node->start;
+
+
+# First verify that we can't connect to or ALTER an invalid database. Just
+# mark the database as invalid ourselves, thats more reliable than hitting the
+# required race conditions (see testing further down)...
+
+$node->safe_psql(
+ "postgres", qq(
+CREATE DATABASE invalid;
+UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+
+my $psql_stdout = '';
+my $psql_stderr = '';
+diag $psql_stderr;
+diag $psql_stdout;
+
+ok($node->psql('invalid', '', stderr=>\$psql_stderr) == 2,
+ "can't connect to invalid database - error code");
+like($psql_stderr, qr/FATAL:\s+cannot connect to invalid database "invalid"/,
+ "can't connect to invalid database - error message");
+
+ok( $node->psql('postgres', 'ALTER DATABASE invalid CONNECTION LIMIT 10') ==
+ 2,
+ "can't ALTER invalid database");
+
+# check invalid database can't be used as a template
+ok( $node->psql('postgres', 'CREATE DATABASE copy_invalid TEMPLATE invalid')
+ == 3,
+ "can't use invalid database as template");
+
+
+# Verify that VACUUM ignores an invalid database when computing how much of
+# the clog is needed (vac_truncate_clog()). For that we modify the pg_database
+# row of the invalid database to have an outdated datfrozenxid.
+$psql_stderr = '';
+$node->psql(
+ 'postgres',
+ qq(
+UPDATE pg_database SET datfrozenxid = '123456' WHERE datname = 'invalid';
+DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl();
+VACUUM FREEZE;),
+ stderr => \$psql_stderr);
+unlike(
+ $psql_stderr,
+ qr/some databases have not been vacuumed in over 2 billion transactions/,
+ "invalid databases are ignored by vac_truncate_clog");
+
+
+# But we need to be able to drop an invalid database.
+ok($node->psql('postgres', 'DROP DATABASE invalid', stdout => \$psql_stdout, stderr => \$psql_stderr) == 0,
+ "can DROP invalid database");
+
+# Ensure database is gone
+ok($node->psql('postgres', 'DROP DATABASE invalid') == 3,
+ "can't drop already dropped database");
+
+
+# Test that interruption of DROP DATABASE is handled properly. To ensure the
+# interruption happens at the appropriate moment, we lock pg_tablespace. DROP
+# DATABASE scans pg_tablespace once it has reached the "irreversible" part of
+# dropping the database, making it a suitable point to wait.
+my $bgpsql = $node->background_psql('postgres', on_error_stop => 0);
+my $pid = $bgpsql->query('SELECT pg_backend_pid()');
+note "background pid is $pid";
+
+# create the database, prevent drop database via lock held by a 2PC transaction
+$bgpsql->query_safe(
+ qq(
+ CREATE DATABASE invalid_interrupt;
+ BEGIN;
+ LOCK pg_tablespace;
+ PREPARE TRANSACTION 'lock_tblspc';));
+
+# Try to drop. This will wait due to the still held lock.
+$bgpsql->query_until(qr//, "DROP DATABASE invalid_interrupt;\n");
+
+# Ensure we're waiting for the lock
+$node->poll_query_until('postgres',
+ qq(SELECT EXISTS(SELECT * FROM pg_locks WHERE NOT granted AND relation = 'pg_tablespace'::regclass AND mode = 'AccessShareLock');)
+);
+
+# and finally interrupt the DROP DATABASE
+ok($node->safe_psql('postgres', "SELECT pg_cancel_backend($pid)"),
+ "cancelled DROP DATABASE");
+
+# wait for cancellation to be processed
+pump_until(
+ $bgpsql->{run}, $bgpsql->{timeout}, \$bgpsql->{stderr},
+ (qr/canceling statement due to user request/, ";\n"),
+ "cancel processed");
+$bgpsql->{stderr} = '';
+
+# verify that connection to the database aren't allowed
+ok( $node->psql('invalid_interrupt', '') == 2,
+ "can't connect to invalid_interrupt database");
+
+# To properly drop the database, we need to release the lock previously preventing
+# doing so.
+ok($bgpsql->query_safe(qq(ROLLBACK PREPARED 'lock_tblspc')),
+ "unblock DROP DATABASE");
+
+ok($bgpsql->query(qq(DROP DATABASE invalid_interrupt)),
+ "DROP DATABASE invalid_interrupt");
+
+$bgpsql->quit();
+
+done_testing();
--
2.38.0
On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
On 2023-06-21 12:02:04 -0700, Andres Freund wrote:I'm hacking on this bugfix again,
This patch LGTM from reading through and testing (manually and with your
supplied tests in the patch), I think this is a sound approach to deal with
this.
I've been adding checks for partiall-dropped databases to the following places
so far:
- vac_truncate_clog(), as autovacuum can't process it anymore. Otherwise a
partially dropped database could easily lead to shutdown-due-to-wraparound.
- get_database_list() - so autovacuum workers don't error out when connecting
- template database used by CREATE DATABASE
- pg_dumpall, so we don't try to connect to the database
- vacuumdb, clusterdb, reindexdb, sameAlso pg_amcheck.
That seems like an exhaustive list to me, I was unable to think of any other
place which would need the same treatment. pg_checksums does come to mind but
it can clearly not see the required info so there doesn't seem like theres a
lot to do about that.
I haven't yet added checks to pg_upgrade, even though that's clearly
needed. I'm waffling a bit between erroring out and just ignoring the
database? pg_upgrade already fails when datallowconn is set "wrongly", see
check_proper_datallowconn(). Any opinions?There don't need to be explict checks, because pg_upgrade will fail, because
it connects to every database. Obviously the error could be nicer, but it
seems ok for something hopefully very rare. I did add a test ensuring that the
behaviour is caught.
I don't see any pg_upgrade test in the patch?
It's somewhat odd that pg_upgrade prints errors on stdout...
There are many odd things about pg_upgrade logging, updating it to use the
common logging framework of other utils would be nice.
I'm not sure what should be done for psql. It's probably not a good idea to
change tab completion, that'd just make it appear the database is gone. But \l
could probably show dropped databases more prominently?I have not done that. I wonder if this is something that should be done in the
back branches?
Possibly, I'm not sure where we usually stand on changing the output format of
\ commands in psql in minor revisions.
A few small comments on the patch:
+ * Max connections allowed (-1=no limit, -2=invalid database). A database
+ * is set to invalid partway through eing dropped. Using datconnlimit=-2
+ * for this purpose isn't particularly clean, but is backpatchable.
Typo: s/eing/being/. A limit of -1 makes sense, but the meaning of -2 is less
intuitive IMO. Would it make sense to add a #define with a more descriptive
name to save folks reading this having to grep around and figure out what -2
means?
+ errhint("Use DROP DATABASE to drop invalid databases"));
Should end with a period as a complete sentence?
+ errmsg("cannot alter invalid database \"%s\"", stmt->dbname),
+ errdetail("Use DROP DATABASE to drop invalid databases"));
Shouldn't this be an errhint() instead? Also ending with a period.
+ if (database_is_invalid_form((Form_pg_database) dbform))
+ continue;
Would it make sense to stick a DEBUG2 log entry in there to signal that such a
database exist? (The same would apply for the similar hunk in autovacuum.c.)
--
Daniel Gustafsson
Hi,
On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:
On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
On 2023-06-21 12:02:04 -0700, Andres Freund wrote:I'm hacking on this bugfix again,
This patch LGTM from reading through and testing (manually and with your
supplied tests in the patch), I think this is a sound approach to deal with
this.
Thanks!
I haven't yet added checks to pg_upgrade, even though that's clearly
needed. I'm waffling a bit between erroring out and just ignoring the
database? pg_upgrade already fails when datallowconn is set "wrongly", see
check_proper_datallowconn(). Any opinions?There don't need to be explict checks, because pg_upgrade will fail, because
it connects to every database. Obviously the error could be nicer, but it
seems ok for something hopefully very rare. I did add a test ensuring that the
behaviour is caught.I don't see any pg_upgrade test in the patch?
Oops, I stashed them alongside some unrelated changes... Included this time.
It's somewhat odd that pg_upgrade prints errors on stdout...
There are many odd things about pg_upgrade logging, updating it to use the
common logging framework of other utils would be nice.
Indeed.
I'm not sure what should be done for psql. It's probably not a good idea to
change tab completion, that'd just make it appear the database is gone. But \l
could probably show dropped databases more prominently?I have not done that. I wonder if this is something that should be done in the
back branches?Possibly, I'm not sure where we usually stand on changing the output format of
\ commands in psql in minor revisions.
I'd normally be quite careful, people do script psql.
While breaking things when encountering an invalid database doesn't actually
sound like a bad thing, I don't think it fits into any of the existing column
output by psql for \l.
A few small comments on the patch:
+ * Max connections allowed (-1=no limit, -2=invalid database). A database + * is set to invalid partway through eing dropped. Using datconnlimit=-2 + * for this purpose isn't particularly clean, but is backpatchable. Typo: s/eing/being/.
Fixed.
A limit of -1 makes sense, but the meaning of -2 is less intuitive IMO.
Would it make sense to add a #define with a more descriptive name to save
folks reading this having to grep around and figure out what -2 means?
I went back and forth about this one. We don't use defines for such things in
all the frontend code today, so the majority of places won't be improved by
adding this. I added them now, which required touching a few otherwise
untouched places, but not too bad.
+ errhint("Use DROP DATABASE to drop invalid databases"));
Should end with a period as a complete sentence?
I get confused about this every time. It's not helped by this example in
sources.sgml:
<programlisting>
Primary: could not create shared memory segment: %m
Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
Hint: the addendum
</programlisting>
Which notably does not use punctuation for the hint. But indeed, later we say:
<para>
Detail and hint messages: Use complete sentences, and end each with
a period. Capitalize the first word of sentences. Put two spaces after
the period if another sentence follows (for English text; might be
inappropriate in other languages).
</para>
+ errmsg("cannot alter invalid database \"%s\"", stmt->dbname), + errdetail("Use DROP DATABASE to drop invalid databases")); Shouldn't this be an errhint() instead? Also ending with a period.
Yep.
+ if (database_is_invalid_form((Form_pg_database) dbform)) + continue; Would it make sense to stick a DEBUG2 log entry in there to signal that such a database exist? (The same would apply for the similar hunk in autovacuum.c.)
I don't really have an opinion on it. Added.
elog(DEBUG2,
"skipping invalid database \"%s\" while computing relfrozenxid",
NameStr(dbform->datname));
and
elog(DEBUG2,
"autovacuum: skipping invalid database \"%s\"",
NameStr(pgdatabase->datname));
Updated patches attached.
Not looking forward to fixing all the conflicts.
Does anybody have an opinion about whether we should add a dedicated field to
pg_database for representing invalid databases in HEAD? I'm inclined to think
that it's not really worth the cross-version complexity at this point, and
it's not that bad a misuse to use pg_database.datconnlimit.
Greetings,
Andres Freund
Attachments:
v3-0001-Release-lock-after-encountering-bogs-row-in-vac_t.patchtext/x-diff; charset=us-asciiDownload
From 6cdbabf5f243d5227588df5bfd3f83018bcefb9a Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 22 Jun 2023 17:27:54 -0700
Subject: [PATCH v3 1/2] Release lock after encountering bogs row in
vac_truncate_clog()
When vac_truncate_clog() encounters bogus datfrozenxid / datminmxid values, it
returns early. Unfortunately, until now, it did not releas
WrapLimitsVacuumLock. If the backend later tries to acquire
WrapLimitsVacuumLock, the session / autovacuum worker hangs in an
uncancellable way. Similarly, other sessions will hang waiting for the
lock. However, if the backend holding the lock exited or errored out for some
reason, the lock was released.
The bug was introduced as a side effect of 566372b3d643.
It is interesting that there are no production reports of this problem. That
is likely due to a mix of bugs leading to bogus values having gotten less
common, process exit releasing locks and instances of hangs being hard to
debug for "normal" users.
Discussion: https://postgr.es/m/20230621221208.vhsqgduwfpzwxnpg@awork3.anarazel.de
---
src/backend/commands/vacuum.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 57ca41add2f..841188f71c0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1893,12 +1893,16 @@ vac_truncate_clog(TransactionId frozenXID,
ereport(WARNING,
(errmsg("some databases have not been vacuumed in over 2 billion transactions"),
errdetail("You might have already suffered transaction-wraparound data loss.")));
+ LWLockRelease(WrapLimitsVacuumLock);
return;
}
/* chicken out if data is bogus in any other way */
if (bogus)
+ {
+ LWLockRelease(WrapLimitsVacuumLock);
return;
+ }
/*
* Advance the oldest value for commit timestamps before truncating, so
--
2.38.0
v3-0002-Handle-DROP-DATABASE-getting-interrupted.patchtext/x-diff; charset=us-asciiDownload
From 2888a26e8aade0176f35248ab3c06b27d71a9963 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 8 May 2023 18:28:33 -0700
Subject: [PATCH v3 2/2] Handle DROP DATABASE getting interrupted
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
---
src/include/catalog/pg_database.h | 20 ++-
src/backend/commands/dbcommands.c | 110 ++++++++++++++---
src/backend/commands/vacuum.c | 14 +++
src/backend/postmaster/autovacuum.c | 12 ++
src/backend/utils/init/postinit.c | 10 ++
src/bin/pg_amcheck/pg_amcheck.c | 2 +-
src/bin/pg_amcheck/t/002_nonesuch.pl | 34 +++++
src/bin/pg_dump/pg_dumpall.c | 4 +-
src/bin/pg_dump/t/002_pg_dump.pl | 17 +++
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 26 ++++
src/bin/scripts/clusterdb.c | 2 +-
src/bin/scripts/reindexdb.c | 2 +-
src/bin/scripts/t/011_clusterdb_all.pl | 14 +++
src/bin/scripts/t/050_dropdb.pl | 9 ++
src/bin/scripts/t/091_reindexdb_all.pl | 14 +++
src/bin/scripts/t/101_vacuumdb_all.pl | 14 +++
src/bin/scripts/vacuumdb.c | 2 +-
src/test/recovery/meson.build | 1 +
src/test/recovery/t/037_invalid_database.pl | 130 ++++++++++++++++++++
19 files changed, 410 insertions(+), 27 deletions(-)
create mode 100644 src/test/recovery/t/037_invalid_database.pl
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index d004f4dc8aa..ff5baaf5141 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -49,7 +49,10 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
/* new connections allowed? */
bool datallowconn;
- /* max connections allowed (-1=no limit) */
+ /*
+ * Max connections allowed. Negative values have special meaning, see
+ * DATCONNLIMIT_* defines below.
+ */
int32 datconnlimit;
/* all Xids < this are frozen in this DB */
@@ -103,4 +106,19 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_database_oid_index, 2672, DatabaseOidIndexId, on pg
DECLARE_OID_DEFINING_MACRO(Template0DbOid, 4);
DECLARE_OID_DEFINING_MACRO(PostgresDbOid, 5);
+/*
+ * Special values for pg_database.datconnlimit. Normal values are >= 0.
+ */
+#define DATCONNLIMIT_UNLIMITED -1 /* no limit */
+
+/*
+ * A database is set to invalid partway through being dropped. Using
+ * datconnlimit=-2 for this purpose isn't particularly clean, but is
+ * backpatchable.
+ */
+#define DATCONNLIMIT_INVALID_DB -2
+
+extern bool database_is_invalid_form(Form_pg_database datform);
+extern bool database_is_invalid_oid(Oid dboid);
+
#endif /* PG_DATABASE_H */
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 09f1ab41ad3..331046f5191 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -719,7 +719,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
int encoding = -1;
bool dbistemplate = false;
bool dballowconnections = true;
- int dbconnlimit = -1;
+ int dbconnlimit = DATCONNLIMIT_UNLIMITED;
char *dbcollversion = NULL;
int notherbackends;
int npreparedxacts;
@@ -926,7 +926,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
if (dconnlimit && dconnlimit->arg)
{
dbconnlimit = defGetInt32(dconnlimit);
- if (dbconnlimit < -1)
+ if (dbconnlimit < DATCONNLIMIT_UNLIMITED)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid connection limit: %d", dbconnlimit)));
@@ -977,6 +977,16 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
errmsg("template database \"%s\" does not exist",
dbtemplate)));
+ /*
+ * If the source database was in the process of being dropped, we can't
+ * use it as a template.
+ */
+ if (database_is_invalid_oid(src_dboid))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use invalid database \"%s\" as template", dbtemplate),
+ errhint("Use DROP DATABASE to drop invalid databases."));
+
/*
* Permission check: to copy a DB that's not marked datistemplate, you
* must be superuser or the owner thereof.
@@ -1576,6 +1586,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
bool db_istemplate;
Relation pgdbrel;
HeapTuple tup;
+ Form_pg_database datform;
int notherbackends;
int npreparedxacts;
int nslots,
@@ -1691,17 +1702,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
dbname),
errdetail_busy_db(notherbackends, npreparedxacts)));
- /*
- * Remove the database's tuple from pg_database.
- */
- tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(db_id));
- if (!HeapTupleIsValid(tup))
- elog(ERROR, "cache lookup failed for database %u", db_id);
-
- CatalogTupleDelete(pgdbrel, &tup->t_self);
-
- ReleaseSysCache(tup);
-
/*
* Delete any comments or security labels associated with the database.
*/
@@ -1718,6 +1718,37 @@ dropdb(const char *dbname, bool missing_ok, bool force)
*/
dropDatabaseDependencies(db_id);
+ /*
+ * Tell the cumulative stats system to forget it immediately, too.
+ */
+ pgstat_drop_database(db_id);
+
+ tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for database %u", db_id);
+ datform = (Form_pg_database) GETSTRUCT(tup);
+
+ /*
+ * Except for the deletion of the catalog row, subsequent actions are not
+ * transactional (consider DropDatabaseBuffers() discarding modified
+ * buffers). But we might crash or get interrupted below. To prevent
+ * accesses to a database with invalid contents, mark the database as
+ * invalid using an in-place update.
+ *
+ * We need to flush the WAL before continuing, to guarantee the
+ * modification is durable before performing irreversible filesystem
+ * operations.
+ */
+ datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
+ heap_inplace_update(pgdbrel, tup);
+ XLogFlush(XactLastRecEnd);
+
+ /*
+ * Also delete the tuple - transactionally. If this transaction commits,
+ * the row will be gone, but if we fail, dropdb() can be invoked again.
+ */
+ CatalogTupleDelete(pgdbrel, &tup->t_self);
+
/*
* Drop db-specific replication slots.
*/
@@ -1730,11 +1761,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
*/
DropDatabaseBuffers(db_id);
- /*
- * Tell the cumulative stats system to forget it immediately, too.
- */
- pgstat_drop_database(db_id);
-
/*
* Tell checkpointer to forget any pending fsync and unlink requests for
* files in the database; else the fsyncs will fail at next checkpoint, or
@@ -2248,7 +2274,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
ListCell *option;
bool dbistemplate = false;
bool dballowconnections = true;
- int dbconnlimit = -1;
+ int dbconnlimit = DATCONNLIMIT_UNLIMITED;
DefElem *distemplate = NULL;
DefElem *dallowconnections = NULL;
DefElem *dconnlimit = NULL;
@@ -2319,7 +2345,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
if (dconnlimit && dconnlimit->arg)
{
dbconnlimit = defGetInt32(dconnlimit);
- if (dbconnlimit < -1)
+ if (dbconnlimit < DATCONNLIMIT_UNLIMITED)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid connection limit: %d", dbconnlimit)));
@@ -2346,6 +2372,14 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
datform = (Form_pg_database) GETSTRUCT(tuple);
dboid = datform->oid;
+ if (datform->datconnlimit == DATCONNLIMIT_INVALID_DB)
+ {
+ ereport(FATAL,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter invalid database \"%s\"", stmt->dbname),
+ errhint("Use DROP DATABASE to drop invalid databases."));
+ }
+
if (!object_ownercheck(DatabaseRelationId, dboid, GetUserId()))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
stmt->dbname);
@@ -3064,6 +3098,42 @@ get_database_name(Oid dbid)
return result;
}
+
+/*
+ * While dropping a database the pg_database row is marked invalid, but the
+ * catalog contents still exist. Connections to such a database are not
+ * allowed.
+ */
+bool
+database_is_invalid_form(Form_pg_database datform)
+{
+ return datform->datconnlimit == DATCONNLIMIT_INVALID_DB;
+}
+
+
+/*
+ * Convenience wrapper around database_is_invalid_form()
+ */
+bool
+database_is_invalid_oid(Oid dboid)
+{
+ HeapTuple dbtup;
+ Form_pg_database dbform;
+ bool invalid;
+
+ dbtup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dboid));
+ if (!HeapTupleIsValid(dbtup))
+ elog(ERROR, "cache lookup failed for database %u", dboid);
+ dbform = (Form_pg_database) GETSTRUCT(dbtup);
+
+ invalid = database_is_invalid_form(dbform);
+
+ ReleaseSysCache(dbtup);
+
+ return invalid;
+}
+
+
/*
* recovery_create_dbdir()
*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 841188f71c0..69ac276687b 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1850,6 +1850,20 @@ vac_truncate_clog(TransactionId frozenXID,
Assert(TransactionIdIsNormal(datfrozenxid));
Assert(MultiXactIdIsValid(datminmxid));
+ /*
+ * If database is in the process of getting dropped, or has been
+ * interrupted while doing so, no connections to it are possible
+ * anymore. Therefore we don't need to take it into account here.
+ * Which is good, because it can't be processed by autovacuum either.
+ */
+ if (database_is_invalid_form((Form_pg_database) dbform))
+ {
+ elog(DEBUG2,
+ "skipping invalid database \"%s\" while computing relfrozenxid",
+ NameStr(dbform->datname));
+ continue;
+ }
+
/*
* If things are working properly, no database should have a
* datfrozenxid or datminmxid that is "in the future". However, such
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f929b62e8ad..ae9be9b9113 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1972,6 +1972,18 @@ get_database_list(void)
avw_dbase *avdb;
MemoryContext oldcxt;
+ /*
+ * If database has partially been dropped, we can't, nor need to,
+ * vacuum it.
+ */
+ if (database_is_invalid_form(pgdatabase))
+ {
+ elog(DEBUG2,
+ "autovacuum: skipping invalid database \"%s\"",
+ NameStr(pgdatabase->datname));
+ continue;
+ }
+
/*
* Allocate our results in the caller's context, not the
* transaction's. We do this inside the loop, and restore the original
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0f9b92b32eb..f31b85c0139 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -1116,6 +1116,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
if (!bootstrap)
{
HeapTuple tuple;
+ Form_pg_database datform;
tuple = GetDatabaseTuple(dbname);
if (!HeapTupleIsValid(tuple) ||
@@ -1125,6 +1126,15 @@ InitPostgres(const char *in_dbname, Oid dboid,
(errcode(ERRCODE_UNDEFINED_DATABASE),
errmsg("database \"%s\" does not exist", dbname),
errdetail("It seems to have just been dropped or renamed.")));
+
+ datform = (Form_pg_database) GETSTRUCT(tuple);
+ if (database_is_invalid_form(datform))
+ {
+ ereport(FATAL,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot connect to invalid database \"%s\"", dbname),
+ errhint("Use DROP DATABASE to drop invalid databases."));
+ }
}
/*
diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 68f8180c19f..57df14bc1e0 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -1590,7 +1590,7 @@ compile_database_list(PGconn *conn, SimplePtrList *databases,
"FROM pg_catalog.pg_database d "
"LEFT OUTER JOIN exclude_raw e "
"ON d.datname ~ e.rgx "
- "\nWHERE d.datallowconn "
+ "\nWHERE d.datallowconn AND datconnlimit != -2 "
"AND e.pattern_id IS NULL"
"),"
diff --git a/src/bin/pg_amcheck/t/002_nonesuch.pl b/src/bin/pg_amcheck/t/002_nonesuch.pl
index cf2438717e1..c7d6c2426e6 100644
--- a/src/bin/pg_amcheck/t/002_nonesuch.pl
+++ b/src/bin/pg_amcheck/t/002_nonesuch.pl
@@ -291,6 +291,40 @@ $node->command_checks_all(
'many unmatched patterns and one matched pattern under --no-strict-names'
);
+
+#########################################
+# Test that an invalid / partially dropped database won't be targeted
+
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+
+$node->command_checks_all(
+ [
+ 'pg_amcheck', '-d', 'invalid'
+ ],
+ 1,
+ [qr/^$/],
+ [
+ qr/pg_amcheck: error: no connectable databases to check matching "invalid"/,
+ ],
+ 'checking handling of invalid database');
+
+$node->command_checks_all(
+ [
+ 'pg_amcheck', '-d', 'postgres',
+ '-t', 'invalid.public.foo',
+ ],
+ 1,
+ [qr/^$/],
+ [
+ qr/pg_amcheck: error: no connectable databases to check matching "invalid.public.foo"/,
+ ],
+ 'checking handling of object in invalid database');
+
+
#########################################
# Test checking otherwise existent objects but in databases where they do not exist
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 36f134137f9..0ab52ca81d7 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1345,7 +1345,7 @@ dropDBs(PGconn *conn)
res = executeQuery(conn,
"SELECT datname "
"FROM pg_database d "
- "WHERE datallowconn "
+ "WHERE datallowconn AND datconnlimit != -2 "
"ORDER BY datname");
if (PQntuples(res) > 0)
@@ -1488,7 +1488,7 @@ dumpDatabases(PGconn *conn)
res = executeQuery(conn,
"SELECT datname "
"FROM pg_database d "
- "WHERE datallowconn "
+ "WHERE datallowconn AND datconnlimit != -2 "
"ORDER BY (datname <> 'template1'), datname");
if (PQntuples(res) > 0)
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index d42243bf71e..6bb1f0a159f 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -1907,6 +1907,15 @@ my %tests = (
},
},
+ 'CREATE DATABASE invalid...' => {
+ create_order => 1,
+ create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid'),
+ regexp => qr/^CREATE DATABASE invalid/m,
+ not_like => {
+ pg_dumpall_dbprivs => 1,
+ },
+ },
+
'CREATE ACCESS METHOD gist2' => {
create_order => 52,
create_sql =>
@@ -4690,6 +4699,14 @@ command_fails_like(
qr/pg_dump: error: connection to server .* failed: FATAL: database "qqq" does not exist/,
'connecting to a non-existent database');
+#########################################
+# Test connecting to an invalid database
+
+$node->command_fails_like(
+ [ 'pg_dump', '-d', 'invalid' ],
+ qr/pg_dump: error: connection to server .* failed: FATAL: cannot connect to invalid database "invalid"/,
+ 'connecting to an invalid database');
+
#########################################
# Test connecting with an unprivileged user
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 41fce089d68..a5688a1cf2a 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -317,6 +317,12 @@ if (defined($ENV{oldinstall}))
}
}
+# Create an invalid database, will be deleted below
+$oldnode->safe_psql('postgres', qq(
+ CREATE DATABASE regression_invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'regression_invalid';
+));
+
# In a VPATH build, we'll be started in the source directory, but we want
# to run pg_upgrade in the build directory so that any files generated finish
# in it, like delete_old_cluster.{sh,bat}.
@@ -345,6 +351,26 @@ ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
"pg_upgrade_output.d/ not removed after pg_upgrade failure");
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
+# Check that pg_upgrade aborts when encountering an invalid database
+command_checks_all(
+ [
+ 'pg_upgrade', '--no-sync', '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir, '-b', $oldbindir,
+ '-B', $newbindir, '-s', $newnode->host,
+ '-p', $oldnode->port, '-P', $newnode->port,
+ $mode, '--check',
+ ],
+ 1,
+ [qr/invalid/], # pg_upgrade prints errors on stdout :(
+ [qr//],
+ 'invalid database causes failure');
+rmtree($newnode->data_dir . "/pg_upgrade_output.d");
+
+# And drop it, so we can continue
+$oldnode->start;
+$oldnode->safe_psql('postgres', 'DROP DATABASE regression_invalid');
+$oldnode->stop;
+
# --check command works here, cleans up pg_upgrade_output.d.
command_ok(
[
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index e3585b3272e..3fea3a8e268 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -234,7 +234,7 @@ cluster_all_databases(ConnParams *cparams, const char *progname,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", echo);
+ result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;", echo);
PQfinish(conn);
for (i = 0; i < PQntuples(result); i++)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 3e8f6aca40e..5e5b075d563 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -718,7 +718,7 @@ reindex_all_databases(ConnParams *cparams,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", echo);
+ result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;", echo);
PQfinish(conn);
for (i = 0; i < PQntuples(result); i++)
diff --git a/src/bin/scripts/t/011_clusterdb_all.pl b/src/bin/scripts/t/011_clusterdb_all.pl
index eb904c08c70..2abfd1c0a97 100644
--- a/src/bin/scripts/t/011_clusterdb_all.pl
+++ b/src/bin/scripts/t/011_clusterdb_all.pl
@@ -21,4 +21,18 @@ $node->issues_sql_like(
qr/statement: CLUSTER.*statement: CLUSTER/s,
'cluster all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'clusterdb', '-a' ],
+ 'invalid database not targeted by clusterdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 010_clusterdb.pl as well.
+$node->command_fails_like([ 'clusterdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'clusterdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/t/050_dropdb.pl b/src/bin/scripts/t/050_dropdb.pl
index 9f2b6463b82..8a55179114e 100644
--- a/src/bin/scripts/t/050_dropdb.pl
+++ b/src/bin/scripts/t/050_dropdb.pl
@@ -31,4 +31,13 @@ $node->issues_sql_like(
$node->command_fails([ 'dropdb', 'nonexistent' ],
'fails with nonexistent database');
+# check that invalid database can be dropped with dropdb
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'dropdb', 'invalid' ],
+ 'invalid database can be dropped');
+
done_testing();
diff --git a/src/bin/scripts/t/091_reindexdb_all.pl b/src/bin/scripts/t/091_reindexdb_all.pl
index ac62b9b5585..4363e0c900a 100644
--- a/src/bin/scripts/t/091_reindexdb_all.pl
+++ b/src/bin/scripts/t/091_reindexdb_all.pl
@@ -18,4 +18,18 @@ $node->issues_sql_like(
qr/statement: REINDEX.*statement: REINDEX/s,
'reindex all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'reindexdb', '-a' ],
+ 'invalid database not targeted by reindexdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 090_reindexdb.pl as well.
+$node->command_fails_like([ 'reindexdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'reindexdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 0f9d5adc48b..e49c3cadc62 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -16,4 +16,18 @@ $node->issues_sql_like(
qr/statement: VACUUM.*statement: VACUUM/s,
'vacuum all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'vacuumdb', '-a' ],
+ 'invalid database not targeted by vacuumdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 010_vacuumdb.pl as well.
+$node->command_fails_like([ 'vacuumdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'vacuumdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4b17a070890..f03d5b1c6cb 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -883,7 +883,7 @@ vacuum_all_databases(ConnParams *cparams,
conn = connectMaintenanceDatabase(cparams, progname, echo);
result = executeQuery(conn,
- "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;",
+ "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
echo);
PQfinish(conn);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 20089580100..e7328e48944 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -42,6 +42,7 @@ tests += {
't/034_create_database.pl',
't/035_standby_logical_decoding.pl',
't/036_truncated_dropped.pl',
+ 't/037_invalid_database.pl',
],
},
}
diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl
new file mode 100644
index 00000000000..69a3531afc4
--- /dev/null
+++ b/src/test/recovery/t/037_invalid_database.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+#
+# Test we handle interrupted DROP DATABASE correctly.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->append_conf(
+ "postgresql.conf", qq(
+autovacuum = off
+max_prepared_transactions=5
+log_min_duration_statement=0
+log_connections=on
+log_disconnections=on
+));
+
+$node->start;
+
+
+# First verify that we can't connect to or ALTER an invalid database. Just
+# mark the database as invalid ourselves, thats more reliable than hitting the
+# required race conditions (see testing further down)...
+
+$node->safe_psql(
+ "postgres", qq(
+CREATE DATABASE invalid;
+UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+
+my $psql_stdout = '';
+my $psql_stderr = '';
+diag $psql_stderr;
+diag $psql_stdout;
+
+ok($node->psql('invalid', '', stderr=>\$psql_stderr) == 2,
+ "can't connect to invalid database - error code");
+like($psql_stderr, qr/FATAL:\s+cannot connect to invalid database "invalid"/,
+ "can't connect to invalid database - error message");
+
+ok( $node->psql('postgres', 'ALTER DATABASE invalid CONNECTION LIMIT 10') ==
+ 2,
+ "can't ALTER invalid database");
+
+# check invalid database can't be used as a template
+ok( $node->psql('postgres', 'CREATE DATABASE copy_invalid TEMPLATE invalid')
+ == 3,
+ "can't use invalid database as template");
+
+
+# Verify that VACUUM ignores an invalid database when computing how much of
+# the clog is needed (vac_truncate_clog()). For that we modify the pg_database
+# row of the invalid database to have an outdated datfrozenxid.
+$psql_stderr = '';
+$node->psql(
+ 'postgres',
+ qq(
+UPDATE pg_database SET datfrozenxid = '123456' WHERE datname = 'invalid';
+DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl();
+VACUUM FREEZE;),
+ stderr => \$psql_stderr);
+unlike(
+ $psql_stderr,
+ qr/some databases have not been vacuumed in over 2 billion transactions/,
+ "invalid databases are ignored by vac_truncate_clog");
+
+
+# But we need to be able to drop an invalid database.
+ok($node->psql('postgres', 'DROP DATABASE invalid', stdout => \$psql_stdout, stderr => \$psql_stderr) == 0,
+ "can DROP invalid database");
+
+# Ensure database is gone
+ok($node->psql('postgres', 'DROP DATABASE invalid') == 3,
+ "can't drop already dropped database");
+
+
+# Test that interruption of DROP DATABASE is handled properly. To ensure the
+# interruption happens at the appropriate moment, we lock pg_tablespace. DROP
+# DATABASE scans pg_tablespace once it has reached the "irreversible" part of
+# dropping the database, making it a suitable point to wait.
+my $bgpsql = $node->background_psql('postgres', on_error_stop => 0);
+my $pid = $bgpsql->query('SELECT pg_backend_pid()');
+note "background pid is $pid";
+
+# create the database, prevent drop database via lock held by a 2PC transaction
+$bgpsql->query_safe(
+ qq(
+ CREATE DATABASE invalid_interrupt;
+ BEGIN;
+ LOCK pg_tablespace;
+ PREPARE TRANSACTION 'lock_tblspc';));
+
+# Try to drop. This will wait due to the still held lock.
+$bgpsql->query_until(qr//, "DROP DATABASE invalid_interrupt;\n");
+
+# Ensure we're waiting for the lock
+$node->poll_query_until('postgres',
+ qq(SELECT EXISTS(SELECT * FROM pg_locks WHERE NOT granted AND relation = 'pg_tablespace'::regclass AND mode = 'AccessShareLock');)
+);
+
+# and finally interrupt the DROP DATABASE
+ok($node->safe_psql('postgres', "SELECT pg_cancel_backend($pid)"),
+ "cancelled DROP DATABASE");
+
+# wait for cancellation to be processed
+pump_until(
+ $bgpsql->{run}, $bgpsql->{timeout}, \$bgpsql->{stderr},
+ (qr/canceling statement due to user request/, ";\n"),
+ "cancel processed");
+$bgpsql->{stderr} = '';
+
+# verify that connection to the database aren't allowed
+ok( $node->psql('invalid_interrupt', '') == 2,
+ "can't connect to invalid_interrupt database");
+
+# To properly drop the database, we need to release the lock previously preventing
+# doing so.
+ok($bgpsql->query_safe(qq(ROLLBACK PREPARED 'lock_tblspc')),
+ "unblock DROP DATABASE");
+
+ok($bgpsql->query(qq(DROP DATABASE invalid_interrupt)),
+ "DROP DATABASE invalid_interrupt");
+
+$bgpsql->quit();
+
+done_testing();
--
2.38.0
On 12 Jul 2023, at 03:59, Andres Freund <andres@anarazel.de> wrote:
On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
On 2023-06-21 12:02:04 -0700, Andres Freund wrote:
There don't need to be explict checks, because pg_upgrade will fail, because
it connects to every database. Obviously the error could be nicer, but it
seems ok for something hopefully very rare. I did add a test ensuring that the
behaviour is caught.I don't see any pg_upgrade test in the patch?
Oops, I stashed them alongside some unrelated changes... Included this time.
Looking more at this I wonder if we in HEAD should make this a bit nicer by
extending the --check phase to catch this? I did a quick hack along these
lines in the 0003 commit attached here (0001 and 0002 are your unchanged
patches, just added for consistency and to be CFBot compatible). If done it
could be a separate commit to make the 0002 patch backport cleaner of course.
I'm not sure what should be done for psql. It's probably not a good idea to
change tab completion, that'd just make it appear the database is gone. But \l
could probably show dropped databases more prominently?I have not done that. I wonder if this is something that should be done in the
back branches?Possibly, I'm not sure where we usually stand on changing the output format of
\ commands in psql in minor revisions.I'd normally be quite careful, people do script psql.
While breaking things when encountering an invalid database doesn't actually
sound like a bad thing, I don't think it fits into any of the existing column
output by psql for \l.
Agreed, it doesn't, it would have to be a new column.
+ errhint("Use DROP DATABASE to drop invalid databases"));
Should end with a period as a complete sentence?I get confused about this every time. It's not helped by this example in
sources.sgml:<programlisting>
Primary: could not create shared memory segment: %m
Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
Hint: the addendum
</programlisting>Which notably does not use punctuation for the hint. But indeed, later we say:
<para>
Detail and hint messages: Use complete sentences, and end each with
a period. Capitalize the first word of sentences. Put two spaces after
the period if another sentence follows (for English text; might be
inappropriate in other languages).
</para>
That's not a very helpful example, and one which may give the wrong impression
unless the entire page is read. I've raised this with a small diff to improve
it on -docs.
Updated patches attached.
This version of the patchset LGTM.
Does anybody have an opinion about whether we should add a dedicated field to
pg_database for representing invalid databases in HEAD? I'm inclined to think
that it's not really worth the cross-version complexity at this point, and
it's not that bad a misuse to use pg_database.datconnlimit.
FWIW I think we should use pg_database.datconnlimit for this, it doesn't seem
like a common enough problem to warrant the added complexity and cost.
--
Daniel Gustafsson
Attachments:
v4-0002-Handle-DROP-DATABASE-getting-interrupted.patchapplication/octet-stream; name=v4-0002-Handle-DROP-DATABASE-getting-interrupted.patch; x-unix-mode=0644Download
From 34d90ba24038869d8a326f38793faffabab291c5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 8 May 2023 18:28:33 -0700
Subject: [PATCH v4 2/3] Handle DROP DATABASE getting interrupted
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
---
src/backend/commands/dbcommands.c | 110 ++++++++++++++---
src/backend/commands/vacuum.c | 14 +++
src/backend/postmaster/autovacuum.c | 12 ++
src/backend/utils/init/postinit.c | 10 ++
src/bin/pg_amcheck/pg_amcheck.c | 2 +-
src/bin/pg_amcheck/t/002_nonesuch.pl | 34 +++++
src/bin/pg_dump/pg_dumpall.c | 4 +-
src/bin/pg_dump/t/002_pg_dump.pl | 17 +++
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 26 ++++
src/bin/scripts/clusterdb.c | 2 +-
src/bin/scripts/reindexdb.c | 2 +-
src/bin/scripts/t/011_clusterdb_all.pl | 14 +++
src/bin/scripts/t/050_dropdb.pl | 9 ++
src/bin/scripts/t/091_reindexdb_all.pl | 14 +++
src/bin/scripts/t/101_vacuumdb_all.pl | 14 +++
src/bin/scripts/vacuumdb.c | 2 +-
src/include/catalog/pg_database.h | 20 ++-
src/test/recovery/meson.build | 1 +
src/test/recovery/t/037_invalid_database.pl | 130 ++++++++++++++++++++
19 files changed, 410 insertions(+), 27 deletions(-)
create mode 100644 src/test/recovery/t/037_invalid_database.pl
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 09f1ab41ad..331046f519 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -719,7 +719,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
int encoding = -1;
bool dbistemplate = false;
bool dballowconnections = true;
- int dbconnlimit = -1;
+ int dbconnlimit = DATCONNLIMIT_UNLIMITED;
char *dbcollversion = NULL;
int notherbackends;
int npreparedxacts;
@@ -926,7 +926,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
if (dconnlimit && dconnlimit->arg)
{
dbconnlimit = defGetInt32(dconnlimit);
- if (dbconnlimit < -1)
+ if (dbconnlimit < DATCONNLIMIT_UNLIMITED)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid connection limit: %d", dbconnlimit)));
@@ -977,6 +977,16 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
errmsg("template database \"%s\" does not exist",
dbtemplate)));
+ /*
+ * If the source database was in the process of being dropped, we can't
+ * use it as a template.
+ */
+ if (database_is_invalid_oid(src_dboid))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use invalid database \"%s\" as template", dbtemplate),
+ errhint("Use DROP DATABASE to drop invalid databases."));
+
/*
* Permission check: to copy a DB that's not marked datistemplate, you
* must be superuser or the owner thereof.
@@ -1576,6 +1586,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
bool db_istemplate;
Relation pgdbrel;
HeapTuple tup;
+ Form_pg_database datform;
int notherbackends;
int npreparedxacts;
int nslots,
@@ -1691,17 +1702,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
dbname),
errdetail_busy_db(notherbackends, npreparedxacts)));
- /*
- * Remove the database's tuple from pg_database.
- */
- tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(db_id));
- if (!HeapTupleIsValid(tup))
- elog(ERROR, "cache lookup failed for database %u", db_id);
-
- CatalogTupleDelete(pgdbrel, &tup->t_self);
-
- ReleaseSysCache(tup);
-
/*
* Delete any comments or security labels associated with the database.
*/
@@ -1718,6 +1718,37 @@ dropdb(const char *dbname, bool missing_ok, bool force)
*/
dropDatabaseDependencies(db_id);
+ /*
+ * Tell the cumulative stats system to forget it immediately, too.
+ */
+ pgstat_drop_database(db_id);
+
+ tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for database %u", db_id);
+ datform = (Form_pg_database) GETSTRUCT(tup);
+
+ /*
+ * Except for the deletion of the catalog row, subsequent actions are not
+ * transactional (consider DropDatabaseBuffers() discarding modified
+ * buffers). But we might crash or get interrupted below. To prevent
+ * accesses to a database with invalid contents, mark the database as
+ * invalid using an in-place update.
+ *
+ * We need to flush the WAL before continuing, to guarantee the
+ * modification is durable before performing irreversible filesystem
+ * operations.
+ */
+ datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
+ heap_inplace_update(pgdbrel, tup);
+ XLogFlush(XactLastRecEnd);
+
+ /*
+ * Also delete the tuple - transactionally. If this transaction commits,
+ * the row will be gone, but if we fail, dropdb() can be invoked again.
+ */
+ CatalogTupleDelete(pgdbrel, &tup->t_self);
+
/*
* Drop db-specific replication slots.
*/
@@ -1730,11 +1761,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
*/
DropDatabaseBuffers(db_id);
- /*
- * Tell the cumulative stats system to forget it immediately, too.
- */
- pgstat_drop_database(db_id);
-
/*
* Tell checkpointer to forget any pending fsync and unlink requests for
* files in the database; else the fsyncs will fail at next checkpoint, or
@@ -2248,7 +2274,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
ListCell *option;
bool dbistemplate = false;
bool dballowconnections = true;
- int dbconnlimit = -1;
+ int dbconnlimit = DATCONNLIMIT_UNLIMITED;
DefElem *distemplate = NULL;
DefElem *dallowconnections = NULL;
DefElem *dconnlimit = NULL;
@@ -2319,7 +2345,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
if (dconnlimit && dconnlimit->arg)
{
dbconnlimit = defGetInt32(dconnlimit);
- if (dbconnlimit < -1)
+ if (dbconnlimit < DATCONNLIMIT_UNLIMITED)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid connection limit: %d", dbconnlimit)));
@@ -2346,6 +2372,14 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
datform = (Form_pg_database) GETSTRUCT(tuple);
dboid = datform->oid;
+ if (datform->datconnlimit == DATCONNLIMIT_INVALID_DB)
+ {
+ ereport(FATAL,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter invalid database \"%s\"", stmt->dbname),
+ errhint("Use DROP DATABASE to drop invalid databases."));
+ }
+
if (!object_ownercheck(DatabaseRelationId, dboid, GetUserId()))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
stmt->dbname);
@@ -3064,6 +3098,42 @@ get_database_name(Oid dbid)
return result;
}
+
+/*
+ * While dropping a database the pg_database row is marked invalid, but the
+ * catalog contents still exist. Connections to such a database are not
+ * allowed.
+ */
+bool
+database_is_invalid_form(Form_pg_database datform)
+{
+ return datform->datconnlimit == DATCONNLIMIT_INVALID_DB;
+}
+
+
+/*
+ * Convenience wrapper around database_is_invalid_form()
+ */
+bool
+database_is_invalid_oid(Oid dboid)
+{
+ HeapTuple dbtup;
+ Form_pg_database dbform;
+ bool invalid;
+
+ dbtup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dboid));
+ if (!HeapTupleIsValid(dbtup))
+ elog(ERROR, "cache lookup failed for database %u", dboid);
+ dbform = (Form_pg_database) GETSTRUCT(dbtup);
+
+ invalid = database_is_invalid_form(dbform);
+
+ ReleaseSysCache(dbtup);
+
+ return invalid;
+}
+
+
/*
* recovery_create_dbdir()
*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 841188f71c..69ac276687 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1850,6 +1850,20 @@ vac_truncate_clog(TransactionId frozenXID,
Assert(TransactionIdIsNormal(datfrozenxid));
Assert(MultiXactIdIsValid(datminmxid));
+ /*
+ * If database is in the process of getting dropped, or has been
+ * interrupted while doing so, no connections to it are possible
+ * anymore. Therefore we don't need to take it into account here.
+ * Which is good, because it can't be processed by autovacuum either.
+ */
+ if (database_is_invalid_form((Form_pg_database) dbform))
+ {
+ elog(DEBUG2,
+ "skipping invalid database \"%s\" while computing relfrozenxid",
+ NameStr(dbform->datname));
+ continue;
+ }
+
/*
* If things are working properly, no database should have a
* datfrozenxid or datminmxid that is "in the future". However, such
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f929b62e8a..ae9be9b911 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1972,6 +1972,18 @@ get_database_list(void)
avw_dbase *avdb;
MemoryContext oldcxt;
+ /*
+ * If database has partially been dropped, we can't, nor need to,
+ * vacuum it.
+ */
+ if (database_is_invalid_form(pgdatabase))
+ {
+ elog(DEBUG2,
+ "autovacuum: skipping invalid database \"%s\"",
+ NameStr(pgdatabase->datname));
+ continue;
+ }
+
/*
* Allocate our results in the caller's context, not the
* transaction's. We do this inside the loop, and restore the original
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0f9b92b32e..f31b85c013 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -1116,6 +1116,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
if (!bootstrap)
{
HeapTuple tuple;
+ Form_pg_database datform;
tuple = GetDatabaseTuple(dbname);
if (!HeapTupleIsValid(tuple) ||
@@ -1125,6 +1126,15 @@ InitPostgres(const char *in_dbname, Oid dboid,
(errcode(ERRCODE_UNDEFINED_DATABASE),
errmsg("database \"%s\" does not exist", dbname),
errdetail("It seems to have just been dropped or renamed.")));
+
+ datform = (Form_pg_database) GETSTRUCT(tuple);
+ if (database_is_invalid_form(datform))
+ {
+ ereport(FATAL,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot connect to invalid database \"%s\"", dbname),
+ errhint("Use DROP DATABASE to drop invalid databases."));
+ }
}
/*
diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 68f8180c19..57df14bc1e 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -1590,7 +1590,7 @@ compile_database_list(PGconn *conn, SimplePtrList *databases,
"FROM pg_catalog.pg_database d "
"LEFT OUTER JOIN exclude_raw e "
"ON d.datname ~ e.rgx "
- "\nWHERE d.datallowconn "
+ "\nWHERE d.datallowconn AND datconnlimit != -2 "
"AND e.pattern_id IS NULL"
"),"
diff --git a/src/bin/pg_amcheck/t/002_nonesuch.pl b/src/bin/pg_amcheck/t/002_nonesuch.pl
index cf2438717e..c7d6c2426e 100644
--- a/src/bin/pg_amcheck/t/002_nonesuch.pl
+++ b/src/bin/pg_amcheck/t/002_nonesuch.pl
@@ -291,6 +291,40 @@ $node->command_checks_all(
'many unmatched patterns and one matched pattern under --no-strict-names'
);
+
+#########################################
+# Test that an invalid / partially dropped database won't be targeted
+
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+
+$node->command_checks_all(
+ [
+ 'pg_amcheck', '-d', 'invalid'
+ ],
+ 1,
+ [qr/^$/],
+ [
+ qr/pg_amcheck: error: no connectable databases to check matching "invalid"/,
+ ],
+ 'checking handling of invalid database');
+
+$node->command_checks_all(
+ [
+ 'pg_amcheck', '-d', 'postgres',
+ '-t', 'invalid.public.foo',
+ ],
+ 1,
+ [qr/^$/],
+ [
+ qr/pg_amcheck: error: no connectable databases to check matching "invalid.public.foo"/,
+ ],
+ 'checking handling of object in invalid database');
+
+
#########################################
# Test checking otherwise existent objects but in databases where they do not exist
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 36f134137f..0ab52ca81d 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1345,7 +1345,7 @@ dropDBs(PGconn *conn)
res = executeQuery(conn,
"SELECT datname "
"FROM pg_database d "
- "WHERE datallowconn "
+ "WHERE datallowconn AND datconnlimit != -2 "
"ORDER BY datname");
if (PQntuples(res) > 0)
@@ -1488,7 +1488,7 @@ dumpDatabases(PGconn *conn)
res = executeQuery(conn,
"SELECT datname "
"FROM pg_database d "
- "WHERE datallowconn "
+ "WHERE datallowconn AND datconnlimit != -2 "
"ORDER BY (datname <> 'template1'), datname");
if (PQntuples(res) > 0)
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index d42243bf71..6bb1f0a159 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -1907,6 +1907,15 @@ my %tests = (
},
},
+ 'CREATE DATABASE invalid...' => {
+ create_order => 1,
+ create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid'),
+ regexp => qr/^CREATE DATABASE invalid/m,
+ not_like => {
+ pg_dumpall_dbprivs => 1,
+ },
+ },
+
'CREATE ACCESS METHOD gist2' => {
create_order => 52,
create_sql =>
@@ -4690,6 +4699,14 @@ command_fails_like(
qr/pg_dump: error: connection to server .* failed: FATAL: database "qqq" does not exist/,
'connecting to a non-existent database');
+#########################################
+# Test connecting to an invalid database
+
+$node->command_fails_like(
+ [ 'pg_dump', '-d', 'invalid' ],
+ qr/pg_dump: error: connection to server .* failed: FATAL: cannot connect to invalid database "invalid"/,
+ 'connecting to an invalid database');
+
#########################################
# Test connecting with an unprivileged user
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 41fce089d6..a5688a1cf2 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -317,6 +317,12 @@ if (defined($ENV{oldinstall}))
}
}
+# Create an invalid database, will be deleted below
+$oldnode->safe_psql('postgres', qq(
+ CREATE DATABASE regression_invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'regression_invalid';
+));
+
# In a VPATH build, we'll be started in the source directory, but we want
# to run pg_upgrade in the build directory so that any files generated finish
# in it, like delete_old_cluster.{sh,bat}.
@@ -345,6 +351,26 @@ ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
"pg_upgrade_output.d/ not removed after pg_upgrade failure");
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
+# Check that pg_upgrade aborts when encountering an invalid database
+command_checks_all(
+ [
+ 'pg_upgrade', '--no-sync', '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir, '-b', $oldbindir,
+ '-B', $newbindir, '-s', $newnode->host,
+ '-p', $oldnode->port, '-P', $newnode->port,
+ $mode, '--check',
+ ],
+ 1,
+ [qr/invalid/], # pg_upgrade prints errors on stdout :(
+ [qr//],
+ 'invalid database causes failure');
+rmtree($newnode->data_dir . "/pg_upgrade_output.d");
+
+# And drop it, so we can continue
+$oldnode->start;
+$oldnode->safe_psql('postgres', 'DROP DATABASE regression_invalid');
+$oldnode->stop;
+
# --check command works here, cleans up pg_upgrade_output.d.
command_ok(
[
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index e3585b3272..3fea3a8e26 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -234,7 +234,7 @@ cluster_all_databases(ConnParams *cparams, const char *progname,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", echo);
+ result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;", echo);
PQfinish(conn);
for (i = 0; i < PQntuples(result); i++)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 3e8f6aca40..5e5b075d56 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -718,7 +718,7 @@ reindex_all_databases(ConnParams *cparams,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;", echo);
+ result = executeQuery(conn, "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;", echo);
PQfinish(conn);
for (i = 0; i < PQntuples(result); i++)
diff --git a/src/bin/scripts/t/011_clusterdb_all.pl b/src/bin/scripts/t/011_clusterdb_all.pl
index eb904c08c7..2abfd1c0a9 100644
--- a/src/bin/scripts/t/011_clusterdb_all.pl
+++ b/src/bin/scripts/t/011_clusterdb_all.pl
@@ -21,4 +21,18 @@ $node->issues_sql_like(
qr/statement: CLUSTER.*statement: CLUSTER/s,
'cluster all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'clusterdb', '-a' ],
+ 'invalid database not targeted by clusterdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 010_clusterdb.pl as well.
+$node->command_fails_like([ 'clusterdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'clusterdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/t/050_dropdb.pl b/src/bin/scripts/t/050_dropdb.pl
index 9f2b6463b8..8a55179114 100644
--- a/src/bin/scripts/t/050_dropdb.pl
+++ b/src/bin/scripts/t/050_dropdb.pl
@@ -31,4 +31,13 @@ $node->issues_sql_like(
$node->command_fails([ 'dropdb', 'nonexistent' ],
'fails with nonexistent database');
+# check that invalid database can be dropped with dropdb
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'dropdb', 'invalid' ],
+ 'invalid database can be dropped');
+
done_testing();
diff --git a/src/bin/scripts/t/091_reindexdb_all.pl b/src/bin/scripts/t/091_reindexdb_all.pl
index ac62b9b558..4363e0c900 100644
--- a/src/bin/scripts/t/091_reindexdb_all.pl
+++ b/src/bin/scripts/t/091_reindexdb_all.pl
@@ -18,4 +18,18 @@ $node->issues_sql_like(
qr/statement: REINDEX.*statement: REINDEX/s,
'reindex all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'reindexdb', '-a' ],
+ 'invalid database not targeted by reindexdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 090_reindexdb.pl as well.
+$node->command_fails_like([ 'reindexdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'reindexdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 0f9d5adc48..e49c3cadc6 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -16,4 +16,18 @@ $node->issues_sql_like(
qr/statement: VACUUM.*statement: VACUUM/s,
'vacuum all databases');
+$node->safe_psql(
+ 'postgres', q(
+ CREATE DATABASE invalid;
+ UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+$node->command_ok([ 'vacuumdb', '-a' ],
+ 'invalid database not targeted by vacuumdb -a');
+
+# Doesn't quite belong here, but don't want to waste time by creating an
+# invalid database in 010_vacuumdb.pl as well.
+$node->command_fails_like([ 'vacuumdb', '-d', 'invalid'],
+ qr/FATAL: cannot connect to invalid database "invalid"/,
+ 'vacuumdb cannot target invalid database');
+
done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4b17a07089..f03d5b1c6c 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -883,7 +883,7 @@ vacuum_all_databases(ConnParams *cparams,
conn = connectMaintenanceDatabase(cparams, progname, echo);
result = executeQuery(conn,
- "SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;",
+ "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
echo);
PQfinish(conn);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index d004f4dc8a..ff5baaf514 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -49,7 +49,10 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
/* new connections allowed? */
bool datallowconn;
- /* max connections allowed (-1=no limit) */
+ /*
+ * Max connections allowed. Negative values have special meaning, see
+ * DATCONNLIMIT_* defines below.
+ */
int32 datconnlimit;
/* all Xids < this are frozen in this DB */
@@ -103,4 +106,19 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_database_oid_index, 2672, DatabaseOidIndexId, on pg
DECLARE_OID_DEFINING_MACRO(Template0DbOid, 4);
DECLARE_OID_DEFINING_MACRO(PostgresDbOid, 5);
+/*
+ * Special values for pg_database.datconnlimit. Normal values are >= 0.
+ */
+#define DATCONNLIMIT_UNLIMITED -1 /* no limit */
+
+/*
+ * A database is set to invalid partway through being dropped. Using
+ * datconnlimit=-2 for this purpose isn't particularly clean, but is
+ * backpatchable.
+ */
+#define DATCONNLIMIT_INVALID_DB -2
+
+extern bool database_is_invalid_form(Form_pg_database datform);
+extern bool database_is_invalid_oid(Oid dboid);
+
#endif /* PG_DATABASE_H */
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 2008958010..e7328e4894 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -42,6 +42,7 @@ tests += {
't/034_create_database.pl',
't/035_standby_logical_decoding.pl',
't/036_truncated_dropped.pl',
+ 't/037_invalid_database.pl',
],
},
}
diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl
new file mode 100644
index 0000000000..69a3531afc
--- /dev/null
+++ b/src/test/recovery/t/037_invalid_database.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+#
+# Test we handle interrupted DROP DATABASE correctly.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->append_conf(
+ "postgresql.conf", qq(
+autovacuum = off
+max_prepared_transactions=5
+log_min_duration_statement=0
+log_connections=on
+log_disconnections=on
+));
+
+$node->start;
+
+
+# First verify that we can't connect to or ALTER an invalid database. Just
+# mark the database as invalid ourselves, thats more reliable than hitting the
+# required race conditions (see testing further down)...
+
+$node->safe_psql(
+ "postgres", qq(
+CREATE DATABASE invalid;
+UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid';
+));
+
+my $psql_stdout = '';
+my $psql_stderr = '';
+diag $psql_stderr;
+diag $psql_stdout;
+
+ok($node->psql('invalid', '', stderr=>\$psql_stderr) == 2,
+ "can't connect to invalid database - error code");
+like($psql_stderr, qr/FATAL:\s+cannot connect to invalid database "invalid"/,
+ "can't connect to invalid database - error message");
+
+ok( $node->psql('postgres', 'ALTER DATABASE invalid CONNECTION LIMIT 10') ==
+ 2,
+ "can't ALTER invalid database");
+
+# check invalid database can't be used as a template
+ok( $node->psql('postgres', 'CREATE DATABASE copy_invalid TEMPLATE invalid')
+ == 3,
+ "can't use invalid database as template");
+
+
+# Verify that VACUUM ignores an invalid database when computing how much of
+# the clog is needed (vac_truncate_clog()). For that we modify the pg_database
+# row of the invalid database to have an outdated datfrozenxid.
+$psql_stderr = '';
+$node->psql(
+ 'postgres',
+ qq(
+UPDATE pg_database SET datfrozenxid = '123456' WHERE datname = 'invalid';
+DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl();
+VACUUM FREEZE;),
+ stderr => \$psql_stderr);
+unlike(
+ $psql_stderr,
+ qr/some databases have not been vacuumed in over 2 billion transactions/,
+ "invalid databases are ignored by vac_truncate_clog");
+
+
+# But we need to be able to drop an invalid database.
+ok($node->psql('postgres', 'DROP DATABASE invalid', stdout => \$psql_stdout, stderr => \$psql_stderr) == 0,
+ "can DROP invalid database");
+
+# Ensure database is gone
+ok($node->psql('postgres', 'DROP DATABASE invalid') == 3,
+ "can't drop already dropped database");
+
+
+# Test that interruption of DROP DATABASE is handled properly. To ensure the
+# interruption happens at the appropriate moment, we lock pg_tablespace. DROP
+# DATABASE scans pg_tablespace once it has reached the "irreversible" part of
+# dropping the database, making it a suitable point to wait.
+my $bgpsql = $node->background_psql('postgres', on_error_stop => 0);
+my $pid = $bgpsql->query('SELECT pg_backend_pid()');
+note "background pid is $pid";
+
+# create the database, prevent drop database via lock held by a 2PC transaction
+$bgpsql->query_safe(
+ qq(
+ CREATE DATABASE invalid_interrupt;
+ BEGIN;
+ LOCK pg_tablespace;
+ PREPARE TRANSACTION 'lock_tblspc';));
+
+# Try to drop. This will wait due to the still held lock.
+$bgpsql->query_until(qr//, "DROP DATABASE invalid_interrupt;\n");
+
+# Ensure we're waiting for the lock
+$node->poll_query_until('postgres',
+ qq(SELECT EXISTS(SELECT * FROM pg_locks WHERE NOT granted AND relation = 'pg_tablespace'::regclass AND mode = 'AccessShareLock');)
+);
+
+# and finally interrupt the DROP DATABASE
+ok($node->safe_psql('postgres', "SELECT pg_cancel_backend($pid)"),
+ "cancelled DROP DATABASE");
+
+# wait for cancellation to be processed
+pump_until(
+ $bgpsql->{run}, $bgpsql->{timeout}, \$bgpsql->{stderr},
+ (qr/canceling statement due to user request/, ";\n"),
+ "cancel processed");
+$bgpsql->{stderr} = '';
+
+# verify that connection to the database aren't allowed
+ok( $node->psql('invalid_interrupt', '') == 2,
+ "can't connect to invalid_interrupt database");
+
+# To properly drop the database, we need to release the lock previously preventing
+# doing so.
+ok($bgpsql->query_safe(qq(ROLLBACK PREPARED 'lock_tblspc')),
+ "unblock DROP DATABASE");
+
+ok($bgpsql->query(qq(DROP DATABASE invalid_interrupt)),
+ "DROP DATABASE invalid_interrupt");
+
+$bgpsql->quit();
+
+done_testing();
--
2.32.1 (Apple Git-133)
v4-0001-Release-lock-after-encountering-bogs-row-in-vac_t.patchapplication/octet-stream; name=v4-0001-Release-lock-after-encountering-bogs-row-in-vac_t.patch; x-unix-mode=0644Download
From 0a05391095185b5939b32d1d28b210d47c2bd597 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 22 Jun 2023 17:27:54 -0700
Subject: [PATCH v4 1/3] Release lock after encountering bogs row in
vac_truncate_clog()
When vac_truncate_clog() encounters bogus datfrozenxid / datminmxid values, it
returns early. Unfortunately, until now, it did not releas
WrapLimitsVacuumLock. If the backend later tries to acquire
WrapLimitsVacuumLock, the session / autovacuum worker hangs in an
uncancellable way. Similarly, other sessions will hang waiting for the
lock. However, if the backend holding the lock exited or errored out for some
reason, the lock was released.
The bug was introduced as a side effect of 566372b3d643.
It is interesting that there are no production reports of this problem. That
is likely due to a mix of bugs leading to bogus values having gotten less
common, process exit releasing locks and instances of hangs being hard to
debug for "normal" users.
Discussion: https://postgr.es/m/20230621221208.vhsqgduwfpzwxnpg@awork3.anarazel.de
---
src/backend/commands/vacuum.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 57ca41add2..841188f71c 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1893,12 +1893,16 @@ vac_truncate_clog(TransactionId frozenXID,
ereport(WARNING,
(errmsg("some databases have not been vacuumed in over 2 billion transactions"),
errdetail("You might have already suffered transaction-wraparound data loss.")));
+ LWLockRelease(WrapLimitsVacuumLock);
return;
}
/* chicken out if data is bogus in any other way */
if (bogus)
+ {
+ LWLockRelease(WrapLimitsVacuumLock);
return;
+ }
/*
* Advance the oldest value for commit timestamps before truncating, so
--
2.32.1 (Apple Git-133)
v4-0003-pg_upgrade-Extend-check-for-database-not-allowing.patchapplication/octet-stream; name=v4-0003-pg_upgrade-Extend-check-for-database-not-allowing.patch; x-unix-mode=0644Download
From 444bf4eb53c2fcb760e529004d31a68a1370c19f Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 12 Jul 2023 11:45:14 +0200
Subject: [PATCH v4 3/3] pg_upgrade: Extend check for database not allowing
connections
Rather than failing to connect, also check for datconnlimit being
-2 indicating that the database is invalid and needs to be dropped
before the upgrade.
---
src/bin/pg_upgrade/check.c | 26 ++++++++++++++++++++------
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 5 +----
2 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..5255dee10c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -595,6 +595,7 @@ check_proper_datallowconn(ClusterInfo *cluster)
int ntups;
int i_datname;
int i_datallowconn;
+ int i_datconnlimit;
FILE *script = NULL;
char output_path[MAXPGPATH];
@@ -602,23 +603,25 @@ check_proper_datallowconn(ClusterInfo *cluster)
snprintf(output_path, sizeof(output_path), "%s/%s",
log_opts.basedir,
- "databases_with_datallowconn_false.txt");
+ "databases_disallowing_connections.txt");
conn_template1 = connectToServer(cluster, "template1");
/* get database names */
dbres = executeQueryOrDie(conn_template1,
- "SELECT datname, datallowconn "
+ "SELECT datname, datallowconn, datconnlimit "
"FROM pg_catalog.pg_database");
i_datname = PQfnumber(dbres, "datname");
i_datallowconn = PQfnumber(dbres, "datallowconn");
+ i_datconnlimit = PQfnumber(dbres, "datconnlimit");
ntups = PQntuples(dbres);
for (dbnum = 0; dbnum < ntups; dbnum++)
{
char *datname = PQgetvalue(dbres, dbnum, i_datname);
char *datallowconn = PQgetvalue(dbres, dbnum, i_datallowconn);
+ char *datconnlimit = PQgetvalue(dbres, dbnum, i_datconnlimit);
if (strcmp(datname, "template0") == 0)
{
@@ -629,6 +632,15 @@ check_proper_datallowconn(ClusterInfo *cluster)
}
else
{
+ if (strcmp(datconnlimit, "-2") == 0)
+ {
+ if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script, "invalid database: %s\n", datname);
+ }
+
/*
* avoid datallowconn == false databases from being skipped on
* restore
@@ -639,7 +651,7 @@ check_proper_datallowconn(ClusterInfo *cluster)
pg_fatal("could not open file \"%s\": %s",
output_path, strerror(errno));
- fprintf(script, "%s\n", datname);
+ fprintf(script, "connections not allowed: %s\n", datname);
}
}
}
@@ -653,9 +665,11 @@ check_proper_datallowconn(ClusterInfo *cluster)
fclose(script);
pg_log(PG_REPORT, "fatal");
pg_fatal("All non-template0 databases must allow connections, i.e. their\n"
- "pg_database.datallowconn must be true. Your installation contains\n"
- "non-template0 databases with their pg_database.datallowconn set to\n"
- "false. Consider allowing connection for all non-template0 databases\n"
+ "pg_database.datallowconn must be true and they must not be marked\n"
+ "as invalid, pg_database.datconnlimit must not be -2.\n"
+ "Your installation contains non-template0 databases with their\n"
+ "pg_database.datallowconn set to false or pg_database.datconnlimit to -2.\n"
+ "Consider allowing connection for all non-template0 databases\n"
"or drop the databases which do not allow connections. A list of\n"
"databases with the problem is in the file:\n"
" %s", output_path);
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index a5688a1cf2..41ad310434 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -352,7 +352,7 @@ ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
# Check that pg_upgrade aborts when encountering an invalid database
-command_checks_all(
+command_fails(
[
'pg_upgrade', '--no-sync', '-d', $oldnode->data_dir,
'-D', $newnode->data_dir, '-b', $oldbindir,
@@ -360,9 +360,6 @@ command_checks_all(
'-p', $oldnode->port, '-P', $newnode->port,
$mode, '--check',
],
- 1,
- [qr/invalid/], # pg_upgrade prints errors on stdout :(
- [qr//],
'invalid database causes failure');
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
--
2.32.1 (Apple Git-133)
Hi,
On 2023-07-12 11:54:18 +0200, Daniel Gustafsson wrote:
On 12 Jul 2023, at 03:59, Andres Freund <andres@anarazel.de> wrote:
On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
On 2023-06-21 12:02:04 -0700, Andres Freund wrote:There don't need to be explict checks, because pg_upgrade will fail, because
it connects to every database. Obviously the error could be nicer, but it
seems ok for something hopefully very rare. I did add a test ensuring that the
behaviour is caught.I don't see any pg_upgrade test in the patch?
Oops, I stashed them alongside some unrelated changes... Included this time.
Looking more at this I wonder if we in HEAD should make this a bit nicer by
extending the --check phase to catch this? I did a quick hack along these
lines in the 0003 commit attached here (0001 and 0002 are your unchanged
patches, just added for consistency and to be CFBot compatible). If done it
could be a separate commit to make the 0002 patch backport cleaner of course.
I don't really have an opinion on that, tbh...
+ errhint("Use DROP DATABASE to drop invalid databases"));
Should end with a period as a complete sentence?I get confused about this every time. It's not helped by this example in
sources.sgml:<programlisting>
Primary: could not create shared memory segment: %m
Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
Hint: the addendum
</programlisting>Which notably does not use punctuation for the hint. But indeed, later we say:
<para>
Detail and hint messages: Use complete sentences, and end each with
a period. Capitalize the first word of sentences. Put two spaces after
the period if another sentence follows (for English text; might be
inappropriate in other languages).
</para>That's not a very helpful example, and one which may give the wrong impression
unless the entire page is read. I've raised this with a small diff to improve
it on -docs.
Thanks for doing that!
Updated patches attached.
This version of the patchset LGTM.
Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...
Thanks for the review!
Greetings,
Andres Freund
On 13 Jul 2023, at 22:52, Andres Freund <andres@anarazel.de> wrote:
On 2023-07-12 11:54:18 +0200, Daniel Gustafsson wrote:
Looking more at this I wonder if we in HEAD should make this a bit nicer by
extending the --check phase to catch this? I did a quick hack along these
lines in the 0003 commit attached here (0001 and 0002 are your unchanged
patches, just added for consistency and to be CFBot compatible). If done it
could be a separate commit to make the 0002 patch backport cleaner of course.I don't really have an opinion on that, tbh...
Fair enough. Thinking more on it I think it has merits, so I will submit that
patch in its own thread on -hackers.
--
Daniel Gustafsson
I noticed that this patch set introduced this pg_dump test:
On 12.07.23 03:59, Andres Freund wrote:
+ 'CREATE DATABASE invalid...' => { + create_order => 1, + create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid'), + regexp => qr/^CREATE DATABASE invalid/m, + not_like => { + pg_dumpall_dbprivs => 1, + }, + },
But the key "not_like" isn't used for anything by that test suite.
Maybe "unlike" was meant? But even then it would be useless because the
"like" key is empty, so there is nothing that "unlike" can subtract
from. Was there something expected from the mention of
"pg_dumpall_dbprivs"?
Perhaps it would be better to write out
like => {},
explicitly, with a comment, like some other tests are doing.
Hi,
On 2023-09-25 01:48:31 +0100, Peter Eisentraut wrote:
I noticed that this patch set introduced this pg_dump test:
On 12.07.23 03:59, Andres Freund wrote:
+ 'CREATE DATABASE invalid...' => { + create_order => 1, + create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname = 'invalid'), + regexp => qr/^CREATE DATABASE invalid/m, + not_like => { + pg_dumpall_dbprivs => 1, + }, + },But the key "not_like" isn't used for anything by that test suite. Maybe
"unlike" was meant?
It's not clear to me either. Invalid databases shouldn't *ever* be dumped, so
explicitly listing pg_dumpall_dbprivs is odd.
TBH, I find this testsuite the most opaque in postgres...
But even then it would be useless because the "like" key is empty, so there
is nothing that "unlike" can subtract from. Was there something expected
from the mention of "pg_dumpall_dbprivs"?
Not that I can figure out...
Perhaps it would be better to write out
like => {},
explicitly, with a comment, like some other tests are doing.
Yea, that looks like the right direction.
I'll go and backpatch the adjustment.
Greetings,
Andres Freund
Hi,
13.07.2023 23:52, Andres Freund wrote:
Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...Thanks for the review!
I've discovered that the test 037_invalid_database, introduced with
c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with
debug_discard_caches = 1 set via TEMP_CONFIG:
echo "debug_discard_caches = 1" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ PROVE_TESTS="t/037*"
# +++ tap check in src/test/recovery +++
[09:05:48] t/037_invalid_database.pl .. 6/?
regress_log_037_invalid_database ends with:
[09:05:51.622](0.021s) # issuing query via background psql:
# CREATE DATABASE regression_invalid_interrupt;
# BEGIN;
# LOCK pg_tablespace;
# PREPARE TRANSACTION 'lock_tblspc';
[09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion
I see two backends waiting:
law 2420132 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] DROP DATABASE waiting
law 2420135 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] startup waiting
and the latter's stack trace:
#0 0x00007f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x0000563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80,
nevents=1) at latch.c:1570
#2 0x0000563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1,
wait_event_info=50331648) at latch.c:1516
#3 0x0000563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at
latch.c:538
#4 0x0000563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 <default_lockmethod>) at
proc.c:1339
#5 0x0000563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, owner=0x563c40ea5af8) at lock.c:1816
#6 0x0000563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false,
reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080
#7 0x0000563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116
#8 0x0000563c3f537aff in relation_open (relationId=1213, lockmode=1) at relation.c:55
#9 0x0000563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44
#10 0x0000563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at catcache.c:980
#11 0x0000563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, touch_index=true) at catcache.c:1083
#12 0x0000563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184
#13 0x0000563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317
#14 0x0000563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law",
useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177
#15 0x0000563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", username=0x563c40e53fe8 "law") at postgres.c:4229
#16 0x0000563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475
It looks like no new backend can be started due to the pg_tablespace lock,
when a new relcache file is needed during the backend initialization.
Best regards,
Alexander
On Tue, Mar 12, 2024 at 9:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
I see two backends waiting:
law 2420132 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] DROP DATABASE waiting
law 2420135 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] startup waiting
Ugh.