pg_upgrade check for invalid databases
Hi,
if a cluster contains invalid databases that we cannot connect to
anymore, pg_upgrade would currently fail when trying to connect to the
first encountered invalid database with
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
connection failure: connection to server on socket "/tmp/.s.PGSQL.50432"
failed: FATAL: cannot connect to invalid database "foo"
HINT: Use DROP DATABASE to drop invalid databases.
Failure, exiting
If there is more than one invalid database, we need to run pg_upgrade
more than once (unless the user inspects pg_database).
I attached two small patches for PG 17 and PG 18 (can be easily
backported to all previous versions upon request). Instead of just
failing to connect with an error, we collect all invalid databases in a
report file invalid_databases.txt:
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking for invalid databases fatal
Your installation contains invalid databases as a consequence of
interrupted DROP DATABASE. They are now marked as corrupted databases
that cannot be connected to anymore. Consider removing them using
DROP DATABASE ...;
A list of invalid databases is in the file:
/usr/local/pgsql/data/18/pg_upgrade_output.d/20240929T200559.707/invalid_databases.txt
Failure, exiting
Any thoughts on the proposed patches?
Attachments:
pg18-v1-0001-pg_upgrade-Add-check-for-invalid-databases.patchtext/x-patch; charset=UTF-8; name=pg18-v1-0001-pg_upgrade-Add-check-for-invalid-databases.patchDownload
From c44f723c033573fa5da72721595bc135a00e3cbf Mon Sep 17 00:00:00 2001
From: Thomas Krennwallner <teakay@aiven.io>
Date: Fri, 20 Sep 2024 17:33:05 -0400
Subject: [PATCH] pg_upgrade: Add check for invalid databases.
Currently, pg_upgrade fails to connect to the first invalid database
it encounters in get_loadable_libraries() and then aborts. While we
print a fatal message with a hint what should be done, the output is
terse and does not report further invalid databases present in an
installation.
Instead of just exiting on first error, we collect all
pg_database.datconnlimit values in get_db_infos(), ignore invalid
databases in process_slot(), and then perform check_for_invalid_dbs(),
which collects all invalid databases in a report file
invalid_databases.txt.
---
src/bin/pg_upgrade/check.c | 62 ++++++++++++++++++++++++++
src/bin/pg_upgrade/info.c | 5 ++-
src/bin/pg_upgrade/pg_upgrade.h | 2 +
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
src/bin/pg_upgrade/task.c | 32 +++++++++++--
5 files changed, 98 insertions(+), 5 deletions(-)
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 12735a4268..b5b0f7cd6f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -12,6 +12,7 @@
#include "catalog/pg_authid_d.h"
#include "catalog/pg_class_d.h"
#include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
#include "fe_utils/string_utils.h"
#include "mb/pg_wchar.h"
#include "pg_upgrade.h"
@@ -19,6 +20,7 @@
static void check_new_cluster_is_empty(void);
static void check_is_install_user(ClusterInfo *cluster);
static void check_proper_datallowconn(ClusterInfo *cluster);
+static void check_for_invalid_dbs(ClusterInfo *cluster);
static void check_for_prepared_transactions(ClusterInfo *cluster);
static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
@@ -598,6 +600,13 @@ check_and_dump_old_cluster(void)
*/
get_db_rel_and_slot_infos(&old_cluster);
+ /*
+ * Verify that the old cluster does not reference invalid databases.
+ * Connections to such databases are not allowed and break the following
+ * checks.
+ */
+ check_for_invalid_dbs(&old_cluster);
+
init_tablespaces();
get_loadable_libraries();
@@ -691,6 +700,13 @@ check_new_cluster(void)
{
get_db_rel_and_slot_infos(&new_cluster);
+ /*
+ * Verify that the new cluster does not reference invalid databases.
+ * Connections to such databases are not allowed and break the following
+ * checks.
+ */
+ check_for_invalid_dbs(&new_cluster);
+
check_new_cluster_is_empty();
check_loadable_libraries();
@@ -1087,6 +1103,52 @@ check_is_install_user(ClusterInfo *cluster)
check_ok();
}
+/*
+ * check_for_invalid_dbs
+ *
+ * Ensure that all databases are valid as connections to invalid databases
+ * are not allowed.
+ */
+static void
+check_for_invalid_dbs(ClusterInfo *cluster)
+{
+ int dbnum;
+ FILE *script = NULL;
+ char output_path[MAXPGPATH];
+
+ prep_status("Checking for invalid databases");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_databases.txt");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
+
+ if (active_db->db_connlimit == DATCONNLIMIT_INVALID_DB)
+ {
+ if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %m", output_path);
+ fprintf(script, "%s\n", active_db->db_name);
+ }
+ }
+
+ if (script)
+ {
+ fclose(script);
+ pg_log(PG_REPORT, "fatal");
+ pg_fatal("Your installation contains invalid databases as a consequence of\n"
+ "interrupted DROP DATABASE. They are now marked as corrupted databases\n"
+ "that cannot be connected to anymore. Consider removing them using\n"
+ " DROP DATABASE ...;\n"
+ "A list of invalid databases is in the file:\n"
+ " %s", output_path);
+ }
+ else
+ check_ok();
+}
+
/*
* check_proper_datallowconn
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f83ded89cb..0e2bd40d1a 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -409,12 +409,13 @@ get_db_infos(ClusterInfo *cluster)
int tupnum;
DbInfo *dbinfos;
int i_datname,
+ i_datconnlimit,
i_oid,
i_spclocation;
char query[QUERY_ALLOC];
snprintf(query, sizeof(query),
- "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+ "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, d.datconnlimit, ");
if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"datlocprovider, datlocale, ");
@@ -436,6 +437,7 @@ get_db_infos(ClusterInfo *cluster)
i_oid = PQfnumber(res, "oid");
i_datname = PQfnumber(res, "datname");
+ i_datconnlimit = PQfnumber(res, "datconnlimit");
i_spclocation = PQfnumber(res, "spclocation");
ntups = PQntuples(res);
@@ -445,6 +447,7 @@ get_db_infos(ClusterInfo *cluster)
{
dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
+ dbinfos[tupnum].db_connlimit = atoi(PQgetvalue(res, tupnum, i_datconnlimit));
snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
PQgetvalue(res, tupnum, i_spclocation));
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..b1c77ca6de 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -194,6 +194,8 @@ typedef struct
char *db_name; /* database name */
char db_tablespace[MAXPGPATH]; /* database default tablespace
* path */
+ int db_connlimit; /* database invalid if set to
+ * DATCONNLIMIT_INVALID_DB */
RelInfoArr rel_arr; /* array of all user relinfos */
LogicalSlotInfoArr slot_arr; /* array of all LogicalSlotInfo */
} DbInfo;
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..c7da6d27b2 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -424,7 +424,7 @@ SKIP:
$mode, '--check',
],
1,
- [qr/invalid/], # pg_upgrade prints errors on stdout :(
+ [qr/invalid_databases\.txt/], # pg_upgrade prints errors on stdout :(
[qr/^$/],
'invalid database causes failure');
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
diff --git a/src/bin/pg_upgrade/task.c b/src/bin/pg_upgrade/task.c
index ba1726c25e..90de2070a1 100644
--- a/src/bin/pg_upgrade/task.c
+++ b/src/bin/pg_upgrade/task.c
@@ -44,6 +44,7 @@
#include "postgres_fe.h"
+#include "catalog/pg_database.h"
#include "common/connect.h"
#include "fe_utils/string_utils.h"
#include "pg_upgrade.h"
@@ -228,6 +229,18 @@ process_query_result(const ClusterInfo *cluster, UpgradeTaskSlot *slot,
PQclear(res);
}
+/*
+ * Closes the connection managed by the slot and frees the slot for immediate reuse.
+ */
+static void
+free_slot(UpgradeTaskSlot *slot)
+{
+ if (slot->conn)
+ PQfinish(slot->conn);
+ memset(slot, 0, sizeof(UpgradeTaskSlot));
+ slot->ready = true;
+}
+
/*
* Advances the state machine for a given slot as necessary.
*/
@@ -255,6 +268,21 @@ process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTas
* new connection.
*/
slot->db_idx = dbs_processing++;
+
+ /*
+ * Do not try to connect to invalid databases here,
+ * check_for_invalid_dbs() will report them later. Skip this
+ * database instead, free the slot and start initiating the next
+ * connection.
+ */
+ if (cluster->dbarr.dbs[slot->db_idx].db_connlimit == DATCONNLIMIT_INVALID_DB)
+ {
+ dbs_complete++;
+ free_slot(slot);
+ process_slot(cluster, slot, task);
+ return;
+ }
+
slot->state = CONNECTING;
start_conn(cluster, slot);
@@ -314,9 +342,7 @@ process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTas
* through the slots.
*/
dbs_complete++;
- PQfinish(slot->conn);
- memset(slot, 0, sizeof(UpgradeTaskSlot));
- slot->ready = true;
+ free_slot(slot);
process_slot(cluster, slot, task);
--
2.46.1
pg17-v1-0001-pg_upgrade-Add-check-for-invalid-databases.patchtext/x-patch; charset=UTF-8; name=pg17-v1-0001-pg_upgrade-Add-check-for-invalid-databases.patchDownload
From bd6da1bbbfe207e91c7ad4fff008da70a99d684c Mon Sep 17 00:00:00 2001
From: Thomas Krennwallner <teakay@aiven.io>
Date: Thu, 12 Sep 2024 16:29:06 -0400
Subject: [PATCH] pg_upgrade: Add check for invalid databases.
Currently, pg_upgrade fails to connect to the first invalid database
it encounters in get_loadable_libraries() and then aborts. While we
print a fatal message with a hint what should be done, the output is
terse and does not report further invalid databases present in an
installation.
Instead of just exiting on first error, we collect all
pg_database.datconnlimit values in get_db_infos(), ignore invalid
databases in get_db_rel_and_slot_infos(), and then perform
check_for_invalid_dbs(), which collects all invalid databases in a
report file invalid_databases.txt.
---
src/bin/pg_upgrade/check.c | 62 ++++++++++++++++++++++++++
src/bin/pg_upgrade/info.c | 13 +++++-
src/bin/pg_upgrade/pg_upgrade.h | 2 +
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
4 files changed, 77 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1cf84cc1bb..d56ea52c27 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -12,6 +12,7 @@
#include "catalog/pg_authid_d.h"
#include "catalog/pg_class_d.h"
#include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
#include "fe_utils/string_utils.h"
#include "mb/pg_wchar.h"
#include "pg_upgrade.h"
@@ -19,6 +20,7 @@
static void check_new_cluster_is_empty(void);
static void check_is_install_user(ClusterInfo *cluster);
static void check_proper_datallowconn(ClusterInfo *cluster);
+static void check_for_invalid_dbs(ClusterInfo *cluster);
static void check_for_prepared_transactions(ClusterInfo *cluster);
static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
@@ -586,6 +588,13 @@ check_and_dump_old_cluster(bool live_check)
*/
get_db_rel_and_slot_infos(&old_cluster, live_check);
+ /*
+ * Verify that the old cluster does not reference invalid databases.
+ * Connections to such databases are not allowed and break the following
+ * checks.
+ */
+ check_for_invalid_dbs(&old_cluster);
+
init_tablespaces();
get_loadable_libraries();
@@ -679,6 +688,13 @@ check_new_cluster(void)
{
get_db_rel_and_slot_infos(&new_cluster, false);
+ /*
+ * Verify that the new cluster does not reference invalid databases.
+ * Connections to such databases are not allowed and break the following
+ * checks.
+ */
+ check_for_invalid_dbs(&new_cluster);
+
check_new_cluster_is_empty();
check_loadable_libraries();
@@ -1075,6 +1091,52 @@ check_is_install_user(ClusterInfo *cluster)
check_ok();
}
+/*
+ * check_for_invalid_dbs
+ *
+ * Ensure that all databases are valid as connections to invalid databases
+ * are not allowed.
+ */
+static void
+check_for_invalid_dbs(ClusterInfo *cluster)
+{
+ int dbnum;
+ FILE *script = NULL;
+ char output_path[MAXPGPATH];
+
+ prep_status("Checking for invalid databases");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_databases.txt");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
+
+ if (active_db->db_connlimit == DATCONNLIMIT_INVALID_DB)
+ {
+ if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %m", output_path);
+ fprintf(script, "%s\n", active_db->db_name);
+ }
+ }
+
+ if (script)
+ {
+ fclose(script);
+ pg_log(PG_REPORT, "fatal");
+ pg_fatal("Your installation contains invalid databases as a consequence of\n"
+ "interrupted DROP DATABASE. They are now marked as corrupted databases\n"
+ "that cannot be connected to anymore. Consider removing them using\n"
+ " DROP DATABASE ...;\n"
+ "A list of invalid databases is in the file:\n"
+ " %s", output_path);
+ }
+ else
+ check_ok();
+}
+
/*
* check_proper_datallowconn
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 5c041fa06e..871b1fc30a 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -11,6 +11,7 @@
#include "access/transam.h"
#include "catalog/pg_class_d.h"
+#include "catalog/pg_database.h"
#include "pg_upgrade.h"
static void create_rel_filename_map(const char *old_data, const char *new_data,
@@ -290,6 +291,13 @@ get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
{
DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+ /*
+ * Do not try to connect to invalid databases here,
+ * check_for_invalid_dbs() will report them later.
+ */
+ if (pDbInfo->db_connlimit == DATCONNLIMIT_INVALID_DB)
+ continue;
+
get_rel_infos(cluster, pDbInfo);
if (cluster == &old_cluster)
@@ -384,12 +392,13 @@ get_db_infos(ClusterInfo *cluster)
int tupnum;
DbInfo *dbinfos;
int i_datname,
+ i_datconnlimit,
i_oid,
i_spclocation;
char query[QUERY_ALLOC];
snprintf(query, sizeof(query),
- "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+ "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, d.datconnlimit, ");
if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"datlocprovider, datlocale, ");
@@ -411,6 +420,7 @@ get_db_infos(ClusterInfo *cluster)
i_oid = PQfnumber(res, "oid");
i_datname = PQfnumber(res, "datname");
+ i_datconnlimit = PQfnumber(res, "datconnlimit");
i_spclocation = PQfnumber(res, "spclocation");
ntups = PQntuples(res);
@@ -420,6 +430,7 @@ get_db_infos(ClusterInfo *cluster)
{
dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
+ dbinfos[tupnum].db_connlimit = atoi(PQgetvalue(res, tupnum, i_datconnlimit));
snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
PQgetvalue(res, tupnum, i_spclocation));
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index e2b99b49fa..ed5d45fa06 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -195,6 +195,8 @@ typedef struct
char *db_name; /* database name */
char db_tablespace[MAXPGPATH]; /* database default tablespace
* path */
+ int db_connlimit; /* database invalid if set to
+ * DATCONNLIMIT_INVALID_DB */
RelInfoArr rel_arr; /* array of all user relinfos */
LogicalSlotInfoArr slot_arr; /* array of all LogicalSlotInfo */
} DbInfo;
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..c7da6d27b2 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -424,7 +424,7 @@ SKIP:
$mode, '--check',
],
1,
- [qr/invalid/], # pg_upgrade prints errors on stdout :(
+ [qr/invalid_databases\.txt/], # pg_upgrade prints errors on stdout :(
[qr/^$/],
'invalid database causes failure');
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
--
2.46.1
On Sun, Sep 29, 2024 at 08:45:50PM -0400, Thomas Krennwallner wrote:
if a cluster contains invalid databases that we cannot connect to anymore,
pg_upgrade would currently fail when trying to connect to the first
encountered invalid database with[...]
If there is more than one invalid database, we need to run pg_upgrade more
than once (unless the user inspects pg_database).I attached two small patches for PG 17 and PG 18 (can be easily backported
to all previous versions upon request). Instead of just failing to connect
with an error, we collect all invalid databases in a report file
invalid_databases.txt:
Should we have pg_upgrade skip invalid databases? If the only valid action
is to drop them, IMHO it seems unnecessary to require users to manually
drop them before retrying pg_upgrade.
--
nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
Should we have pg_upgrade skip invalid databases? If the only valid action
is to drop them, IMHO it seems unnecessary to require users to manually
drop them before retrying pg_upgrade.
I was thinking the same. But I wonder if there is any chance of
losing data that could be recoverable. It feels like this should
not be a default behavior.
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?
regards, tom lane
On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?
One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database. Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?
One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database. Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.
OK, but the consistency argument would be to just report and fail.
I don't think there's a precedent in other pg_upgrade checks for
trying to fix problems automatically.
regards, tom lane
On 30/09/2024 17.29, Daniel Gustafsson wrote:
On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database. Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.
In general, I agree that this situation should be rare for deliberate
DROP DATABASE interrupted in interactive sessions.
Unfortunately, for (popular) tools that perform automatic "temporary
database" cleanup, we could recently see an increase in invalid databases.
The additional check for pg_upgrade was made necessary due to several
unrelated customers having invalid databases that stem from left-over
Prisma Migrate "shadow databases" [1]https://www.prisma.io/docs/orm/prisma-migrate/understanding-prisma-migrate/shadow-database. We could not reproduce this
Prisma Migrate issue yet, as those migrations happened some time ago.
Maybe this bug really stems from a much older Prisma Migrate version and
we only see the fallout now. This is still a TODO item.
But it appears that this tool can get interrupted "at the wrong time"
while it is deleting temporary databases (probably a manual Ctrl-C), and
clients are unaware that this can then leave behind invalid databases.
Those temporary databases do not cause any harm as they are not used
anymore. But eventually, PG installations will be upgraded to the next
major version, and it is only then when those invalid databases
resurface after pg_upgrade fails to run the checks.
Long story short: interactive DROP DATABASE interrupts are rare (they do
exist, but customers are usually aware). Automation tools on the other
hand may run DROP DATABASE and when they get interrupted at the wrong
time they will then produce several left-over invalid databases.
pg_upgrade will then fail to run the checks.
[1]: https://www.prisma.io/docs/orm/prisma-migrate/understanding-prisma-migrate/shadow-database
https://www.prisma.io/docs/orm/prisma-migrate/understanding-prisma-migrate/shadow-database
On 1 Oct 2024, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database. Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.OK, but the consistency argument would be to just report and fail.
I don't think there's a precedent in other pg_upgrade checks for
trying to fix problems automatically.
Correct, sorry for being unclear. The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.
--
Daniel Gustafsson
On 1 Oct 2024, at 02:35, Thomas Krennwallner <tk@postsubmeta.net> wrote:
On 30/09/2024 17.29, Daniel Gustafsson wrote:
On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database. Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.In general, I agree that this situation should be rare for deliberate DROP DATABASE interrupted in interactive sessions.
Unfortunately, for (popular) tools that perform automatic "temporary database" cleanup, we could recently see an increase in invalid databases.
The additional check for pg_upgrade was made necessary due to several unrelated customers having invalid databases that stem from left-over Prisma Migrate "shadow databases" [1]. We could not reproduce this Prisma Migrate issue yet, as those migrations happened some time ago. Maybe this bug really stems from a much older Prisma Migrate version and we only see the fallout now. This is still a TODO item.
But it appears that this tool can get interrupted "at the wrong time" while it is deleting temporary databases (probably a manual Ctrl-C), and clients are unaware that this can then leave behind invalid databases.
Those temporary databases do not cause any harm as they are not used anymore. But eventually, PG installations will be upgraded to the next major version, and it is only then when those invalid databases resurface after pg_upgrade fails to run the checks.
Databases containing transient data no longer needed left by buggy tools is one
thing, but pg_upgrade won't be able to differentiate between those and invalid
databases of legitimate interest. Allowing pg_upgrade to skip invalid
databases expose the risk of (potentially) valuable data being dropped during
the upgrade due to the user not having realized a rarely-used production
database was invalid.
Long story short: interactive DROP DATABASE interrupts are rare (they do exist, but customers are usually aware). Automation tools on the other hand may run DROP DATABASE and when they get interrupted at the wrong time they will then produce several left-over invalid databases. pg_upgrade will then fail to run the checks.
Checking and reporting all invalid databases during the check phase seems like
a user-friendly option here, I can agree that the current behaviour isn't great
for users experiencing this issue.
--
Daniel Gustafsson
On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
On 1 Oct 2024, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database. Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.OK, but the consistency argument would be to just report and fail.
I don't think there's a precedent in other pg_upgrade checks for
trying to fix problems automatically.Correct, sorry for being unclear. The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.
Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough. If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"
On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
Correct, sorry for being unclear. The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough. If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.
While I suspect it's rare that someone CTRL-C's out of an accidental DROP
DATABASE and then runs pg_upgrade before trying to recover the data, I
agree with the principle of having pg_upgrade fail by default for things
like this. If we did add a new flag, the new invalid database report that
Daniel mentions could say something like "try again with
--skip-invalid-databases to have pg_upgrade automatically drop invalid
databases."
--
nathan
On 7 Oct 2024, at 22:04, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
Correct, sorry for being unclear. The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough. If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.While I suspect it's rare that someone CTRL-C's out of an accidental DROP
DATABASE and then runs pg_upgrade before trying to recover the data, I
agree with the principle of having pg_upgrade fail by default for things
like this. If we did add a new flag, the new invalid database report that
Daniel mentions could say something like "try again with
--skip-invalid-databases to have pg_upgrade automatically drop invalid
databases."
If we are teaching pg_upgrade to handle errors, either by skipping or by
fixing, then I believe this is the right way to go about it. A successful run
should probably also create a report of the databases which were skipped.
--
Daniel Gustafsson
On Fri, 11 Oct 2024 at 04:01, Daniel Gustafsson <daniel@yesql.se> wrote:
On 7 Oct 2024, at 22:04, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
Correct, sorry for being unclear. The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough. If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.While I suspect it's rare that someone CTRL-C's out of an accidental DROP
DATABASE and then runs pg_upgrade before trying to recover the data, I
agree with the principle of having pg_upgrade fail by default for things
like this. If we did add a new flag, the new invalid database report that
Daniel mentions could say something like "try again with
--skip-invalid-databases to have pg_upgrade automatically drop invalid
databases."If we are teaching pg_upgrade to handle errors, either by skipping or by
fixing, then I believe this is the right way to go about it. A successful run
should probably also create a report of the databases which were skipped.
In v2 I've made changes to the patch incorporating the suggestions here:
* Default behaviour is to just fail with a report of all invalid databases
* A new option --skip-invalid-databases will then skip the checks, and
would not transfer any invalid database to the new cluster. A warning
with a report file will then follow after a successful run.
Dropping invalid databases in the old cluster will make invalid
databases unrecoverable, so I opted for a skip over invalid databases
approach that would leave invalid databases in the old cluster.
Apart from a missing --skip-invalid-databases test, does this attempt look OK?
Attachments:
pg18-v2-0001-pg_upgrade-Add-check-for-invalid-databases.patchtext/x-patch; charset=US-ASCII; name=pg18-v2-0001-pg_upgrade-Add-check-for-invalid-databases.patchDownload
From e51f581ddc1158a9fb2840dfc04863618a390887 Mon Sep 17 00:00:00 2001
From: Thomas Krennwallner <teakay@aiven.io>
Date: Fri, 20 Sep 2024 17:33:05 -0400
Subject: [PATCH] pg_upgrade: Add check for invalid databases.
Currently, pg_upgrade fails to connect to the first invalid database
it encounters in get_loadable_libraries() and then aborts. While we
print a fatal message with a hint what should be done, the output is
terse and does not report further invalid databases present in an
installation.
Instead of just exiting on first error, we collect all
pg_database.datconnlimit values in get_db_infos() and order invalid
databases after valid ones in cluster.dbarr. This allows for skipping
over invalid databases in all checks with --skip-invalid-databases, a
new option where we do not transfer such databases from the old
cluster.
Unless we use --skip-invalid-databases, check_for_invalid_dbs()
collects all invalid databases in a report file invalid_databases.txt.
After a successful pg_upgrade run with skipped invalid databases,
create_invalid_databases_report() will collect them in a report file
skipped_invalid_databases.txt for the completion banner output.
---
doc/src/sgml/ref/pgupgrade.sgml | 19 ++++
src/bin/pg_upgrade/check.c | 131 +++++++++++++++++++++++--
src/bin/pg_upgrade/info.c | 20 +++-
src/bin/pg_upgrade/option.c | 7 ++
src/bin/pg_upgrade/pg_upgrade.c | 18 +++-
src/bin/pg_upgrade/pg_upgrade.h | 8 +-
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
7 files changed, 187 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4777381dac..a9a6b62e7a 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -253,6 +253,25 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--skip-invalid-databases</option></term>
+ <listitem>
+ <para>
+ If a cluster contains invalid databases, the default behaviour of
+ <application>pg_upgrade</application> is to fail with a report of all
+ such databases found in an installation. Invalid databases, which cannot
+ be connected to anymore, are a consequence of interrupted
+ <command>DROP DATABASE</command> and marked as corrupted databases in
+ <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>datconnlimit</structfield>.
+ </para>
+ <para>
+ When using option <option>--skip-invalid-databases</option>,
+ <application>pg_upgrade</application> instead will skip all invalid
+ databases and do not transfer them over to the new cluster.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--sync-method=</option><replaceable>method</replaceable></term>
<listitem>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 12735a4268..e372feff1a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -12,6 +12,7 @@
#include "catalog/pg_authid_d.h"
#include "catalog/pg_class_d.h"
#include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
#include "fe_utils/string_utils.h"
#include "mb/pg_wchar.h"
#include "pg_upgrade.h"
@@ -19,6 +20,7 @@
static void check_new_cluster_is_empty(void);
static void check_is_install_user(ClusterInfo *cluster);
static void check_proper_datallowconn(ClusterInfo *cluster);
+static void check_for_invalid_dbs(ClusterInfo *cluster);
static void check_for_prepared_transactions(ClusterInfo *cluster);
static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
@@ -598,6 +600,11 @@ check_and_dump_old_cluster(void)
*/
get_db_rel_and_slot_infos(&old_cluster);
+ /*
+ * Verify that the old cluster does not contain invalid databases.
+ */
+ check_for_invalid_dbs(&old_cluster);
+
init_tablespaces();
get_loadable_libraries();
@@ -691,6 +698,11 @@ check_new_cluster(void)
{
get_db_rel_and_slot_infos(&new_cluster);
+ /*
+ * Verify that the new cluster does not contain invalid databases.
+ */
+ check_for_invalid_dbs(&new_cluster);
+
check_new_cluster_is_empty();
check_loadable_libraries();
@@ -763,7 +775,7 @@ issue_warnings_and_set_wal_level(void)
void
-output_completion_banner(char *deletion_script_file_name)
+output_completion_banner(char *deletion_script_file_name, char *skipped_invalid_databases_file_name)
{
PQExpBufferData user_specification;
@@ -775,6 +787,14 @@ output_completion_banner(char *deletion_script_file_name)
appendPQExpBufferChar(&user_specification, ' ');
}
+ if (skipped_invalid_databases_file_name)
+ {
+ pg_log(PG_REPORT,
+ "The following invalid databases were not transferred by pg_upgrade:\n"
+ " %s",
+ skipped_invalid_databases_file_name);
+ }
+
pg_log(PG_REPORT,
"Optimizer statistics are not transferred by pg_upgrade.\n"
"Once you start the new server, consider running:\n"
@@ -998,7 +1018,7 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name)
fprintf(script, "\n");
- for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ntotaldbs; dbnum++)
fprintf(script, RMDIR_CMD " %c%s%c%u%c\n", PATH_QUOTE,
fix_path_separator(os_info.old_tablespaces[tblnum]),
PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid,
@@ -1087,6 +1107,96 @@ check_is_install_user(ClusterInfo *cluster)
check_ok();
}
+/*
+ * check_for_invalid_dbs
+ *
+ * Ensure that all databases are valid as connections to invalid databases
+ * are not allowed.
+ */
+static void
+check_for_invalid_dbs(ClusterInfo *cluster)
+{
+ int dbnum;
+ FILE *script = NULL;
+ char output_path[MAXPGPATH];
+
+ prep_status("Checking for invalid databases");
+
+ if (user_opts.skip_invalid_dbs)
+ {
+ pg_log(PG_REPORT, "skipped");
+ return;
+ }
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_databases.txt");
+
+ /*
+ * If invalid databases exist in cluster they are ordered after the valid
+ * ones in dbarr.
+ */
+ for (dbnum = cluster->dbarr.ndbs; dbnum < cluster->dbarr.ntotaldbs; dbnum++)
+ {
+ if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %m", output_path);
+ fprintf(script, "%s\n", cluster->dbarr.dbs[dbnum].db_name);
+ }
+
+ if (script)
+ {
+ fclose(script);
+ pg_log(PG_REPORT, "fatal");
+ pg_fatal("Your installation contains invalid databases as a consequence of\n"
+ "interrupted DROP DATABASE. They are now marked as corrupted databases\n"
+ "that cannot be connected to anymore. Consider removing them using\n"
+ " DROP DATABASE ...;\n"
+ "or run pg_upgrade with --skip-invalid-databases to skip transferring\n"
+ "invalid databases to the new cluster.\n"
+ "A list of invalid databases is in the file:\n"
+ " %s", output_path);
+ }
+ else
+ check_ok();
+}
+
+
+/*
+ * create_invalid_databases_report()
+ *
+ * This is particularly useful for listing skipped invalid databases.
+ */
+void
+create_invalid_databases_report(char **skipped_invalid_databases_file_name)
+{
+ int dbnum;
+ FILE *script = NULL;
+
+ if (!user_opts.skip_invalid_dbs || old_cluster.dbarr.ndbs == old_cluster.dbarr.ntotaldbs)
+ return;
+
+ *skipped_invalid_databases_file_name = psprintf("%sskipped_invalid_databases.txt",
+ SCRIPT_PREFIX);
+
+ prep_status("Creating report for skipped invalid databases");
+
+ for (dbnum = old_cluster.dbarr.ndbs; dbnum < old_cluster.dbarr.ntotaldbs; dbnum++)
+ {
+ if (script == NULL && (script = fopen_priv(*skipped_invalid_databases_file_name, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %m", *skipped_invalid_databases_file_name);
+ fprintf(script, "%s\n", old_cluster.dbarr.dbs[dbnum].db_name);
+ }
+
+ if (script != NULL)
+ {
+ report_status(PG_WARNING, "warning");
+ pg_log(PG_WARNING,
+ "\nWARNING: old cluster contains invalid databases which have been skipped");
+ fclose(script);
+ }
+ else
+ check_ok();
+}
/*
* check_proper_datallowconn
@@ -1115,10 +1225,15 @@ check_proper_datallowconn(ClusterInfo *cluster)
conn_template1 = connectToServer(cluster, "template1");
- /* get database names */
+ /*
+ * get database names; skip invalid databases as they are handled in
+ * check_for_invalid_dbs()
+ */
dbres = executeQueryOrDie(conn_template1,
"SELECT datname, datallowconn "
- "FROM pg_catalog.pg_database");
+ "FROM pg_catalog.pg_database "
+ "WHERE datconnlimit != %d",
+ DATCONNLIMIT_INVALID_DB);
i_datname = PQfnumber(dbres, "datname");
i_datallowconn = PQfnumber(dbres, "datallowconn");
@@ -1982,7 +2097,9 @@ check_old_cluster_subscription_state(void)
/*
* Check that all the subscriptions have their respective replication
- * origin. This check only needs to run once.
+ * origin. This check only needs to run once. Skip invalid databases,
+ * this check will only be performed when user_opts.skip_invalid_dbs is
+ * true.
*/
conn = connectToServer(&old_cluster, old_cluster.dbarr.dbs[0].db_name);
res = executeQueryOrDie(conn,
@@ -1992,7 +2109,9 @@ check_old_cluster_subscription_state(void)
" ON o.roname = 'pg_' || s.oid "
"INNER JOIN pg_catalog.pg_database d "
" ON d.oid = s.subdbid "
- "WHERE o.roname IS NULL;");
+ "WHERE o.roname IS NULL "
+ " AND d.datconnlimit != %d",
+ DATCONNLIMIT_INVALID_DB);
ntup = PQntuples(res);
for (int i = 0; i < ntup; i++)
{
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f83ded89cb..514ed2d612 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -11,6 +11,7 @@
#include "access/transam.h"
#include "catalog/pg_class_d.h"
+#include "catalog/pg_database.h"
#include "pg_upgrade.h"
#include "pqexpbuffer.h"
@@ -407,14 +408,16 @@ get_db_infos(ClusterInfo *cluster)
PGresult *res;
int ntups;
int tupnum;
+ int valid_tups;
DbInfo *dbinfos;
int i_datname,
+ i_datconnlimit,
i_oid,
i_spclocation;
char query[QUERY_ALLOC];
snprintf(query, sizeof(query),
- "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+ "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, d.datconnlimit, ");
if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"datlocprovider, datlocale, ");
@@ -430,30 +433,36 @@ get_db_infos(ClusterInfo *cluster)
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "
" ON d.dattablespace = t.oid "
"WHERE d.datallowconn = true "
- "ORDER BY 1");
+ "ORDER BY "
+ " CASE WHEN d.datconnlimit != %d THEN 0 ELSE 1 END, "
+ " d.oid", DATCONNLIMIT_INVALID_DB);
res = executeQueryOrDie(conn, "%s", query);
i_oid = PQfnumber(res, "oid");
i_datname = PQfnumber(res, "datname");
+ i_datconnlimit = PQfnumber(res, "datconnlimit");
i_spclocation = PQfnumber(res, "spclocation");
ntups = PQntuples(res);
dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
- for (tupnum = 0; tupnum < ntups; tupnum++)
+ for (tupnum = 0, valid_tups = 0; tupnum < ntups; tupnum++)
{
dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
PQgetvalue(res, tupnum, i_spclocation));
+ if (atoi(PQgetvalue(res, tupnum, i_datconnlimit)) != DATCONNLIMIT_INVALID_DB)
+ valid_tups++;
}
PQclear(res);
PQfinish(conn);
cluster->dbarr.dbs = dbinfos;
- cluster->dbarr.ndbs = ntups;
+ cluster->dbarr.ndbs = valid_tups;
+ cluster->dbarr.ntotaldbs = ntups;
}
@@ -774,7 +783,7 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
{
int dbnum;
- for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
+ for (dbnum = 0; dbnum < db_arr->ntotaldbs; dbnum++)
{
free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
pg_free(db_arr->dbs[dbnum].db_name);
@@ -782,6 +791,7 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
pg_free(db_arr->dbs);
db_arr->dbs = NULL;
db_arr->ndbs = 0;
+ db_arr->ntotaldbs = 0;
}
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 6f41d63eed..1937796a72 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -60,6 +60,7 @@ parseCommandLine(int argc, char *argv[])
{"copy", no_argument, NULL, 2},
{"copy-file-range", no_argument, NULL, 3},
{"sync-method", required_argument, NULL, 4},
+ {"skip-invalid-databases", no_argument, NULL, 5},
{NULL, 0, NULL, 0}
};
@@ -69,6 +70,7 @@ parseCommandLine(int argc, char *argv[])
DataDirSyncMethod unused;
user_opts.do_sync = true;
+ user_opts.skip_invalid_dbs = false;
user_opts.transfer_mode = TRANSFER_MODE_COPY;
os_info.progname = get_progname(argv[0]);
@@ -212,6 +214,10 @@ parseCommandLine(int argc, char *argv[])
user_opts.sync_method = pg_strdup(optarg);
break;
+ case 5:
+ user_opts.skip_invalid_dbs = true;
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
os_info.progname);
@@ -306,6 +312,7 @@ usage(void)
printf(_(" --clone clone instead of copying files to new cluster\n"));
printf(_(" --copy copy files to new cluster (default)\n"));
printf(_(" --copy-file-range copy files to new cluster with copy_file_range\n"));
+ printf(_(" --skip-invalid-databases skip transferring invalid databases to the new cluster\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..b0353327b9 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -41,6 +41,7 @@
#include <time.h>
#include "catalog/pg_class_d.h"
+#include "catalog/pg_database.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/restricted_token.h"
@@ -83,7 +84,8 @@ char *output_files[] = {
int
main(int argc, char **argv)
{
- char *deletion_script_file_name = NULL;
+ char *deletion_script_file_name = NULL,
+ *skipped_invalid_databases_file_name = NULL;
/*
* pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -219,6 +221,8 @@ main(int argc, char **argv)
create_script_for_old_cluster_deletion(&deletion_script_file_name);
+ create_invalid_databases_report(&skipped_invalid_databases_file_name);
+
issue_warnings_and_set_wal_level();
pg_log(PG_REPORT,
@@ -226,9 +230,10 @@ main(int argc, char **argv)
"Upgrade Complete\n"
"----------------");
- output_completion_banner(deletion_script_file_name);
+ output_completion_banner(deletion_script_file_name, skipped_invalid_databases_file_name);
pg_free(deletion_script_file_name);
+ pg_free(skipped_invalid_databases_file_name);
cleanup_output_dirs();
@@ -859,10 +864,15 @@ set_frozenxids(bool minmxid_only)
"SET datminmxid = '%u'",
old_cluster.controldata.chkpnt_nxtmulti));
- /* get database names */
+ /*
+ * get database names; skip invalid databases as they are handled in
+ * check_for_invalid_dbs()
+ */
dbres = executeQueryOrDie(conn_template1,
"SELECT datname, datallowconn "
- "FROM pg_catalog.pg_database");
+ "FROM pg_catalog.pg_database "
+ "WHERE datconnlimit != %d",
+ DATCONNLIMIT_INVALID_DB);
i_datname = PQfnumber(dbres, "datname");
i_datallowconn = PQfnumber(dbres, "datallowconn");
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..8abb1688ca 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -213,7 +213,8 @@ typedef struct
typedef struct
{
DbInfo *dbs; /* array of db infos */
- int ndbs; /* number of db infos */
+ int ndbs; /* number of valid db infos */
+ int ntotaldbs; /* total number of db infos */
} DbInfoArr;
/*
@@ -323,6 +324,8 @@ typedef struct
bool check; /* check clusters only, don't change any data */
bool live_check; /* check clusters only, old server is running */
bool do_sync; /* flush changes to disk */
+ bool skip_invalid_dbs; /* skip transferring invalid databases to
+ * the new cluster */
transferMode transfer_mode; /* copy files or link them? */
int jobs; /* number of processes/threads to use */
char *socketdir; /* directory to use for Unix sockets */
@@ -371,10 +374,11 @@ void check_and_dump_old_cluster(void);
void check_new_cluster(void);
void report_clusters_compatible(void);
void issue_warnings_and_set_wal_level(void);
-void output_completion_banner(char *deletion_script_file_name);
+void output_completion_banner(char *deletion_script_file_name, char *skipped_invalid_databases_file_name);
void check_cluster_versions(void);
void check_cluster_compatibility(void);
void create_script_for_old_cluster_deletion(char **deletion_script_file_name);
+void create_invalid_databases_report(char **skipped_invalid_databases_file_name);
/* controldata.c */
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..c7da6d27b2 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -424,7 +424,7 @@ SKIP:
$mode, '--check',
],
1,
- [qr/invalid/], # pg_upgrade prints errors on stdout :(
+ [qr/invalid_databases\.txt/], # pg_upgrade prints errors on stdout :(
[qr/^$/],
'invalid database causes failure');
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
--
2.46.2
On Sun, Oct 13, 2024 at 08:28:57AM -0400, Thomas Krennwallner wrote:
In v2 I've made changes to the patch incorporating the suggestions here:
* Default behaviour is to just fail with a report of all invalid databases
* A new option --skip-invalid-databases will then skip the checks, and
would not transfer any invalid database to the new cluster. A warning
with a report file will then follow after a successful run.Dropping invalid databases in the old cluster will make invalid
databases unrecoverable, so I opted for a skip over invalid databases
approach that would leave invalid databases in the old cluster.Apart from a missing --skip-invalid-databases test, does this attempt look OK?
I don't think there is enough demand for the feature of skipping invalid
databases because we have gotten few reports of such problems, and also
because your case is related to an external tool causing this problem.
What might be acceptable would be to add an option that would make
pg_upgrade more tolerant of problems in many areas, that is a lot more
research and discussion.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"
On 14 Oct 2024, at 18:57, Bruce Momjian <bruce@momjian.us> wrote:
What might be acceptable would be to add an option that would make
pg_upgrade more tolerant of problems in many areas, that is a lot more
research and discussion.
I agree that the concept of having pg_upgrade perform (opt-in) skipping and/or
repairs of the old cluster warrants a larger discussion in its own thread.
There has been significant amount of energy spent recently to add structure to
the checks, any new feature should be properly designed for the get-go.
In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
fails hard on invalid databases instead of detecting and reporting like how
other errors are handled. The attached adds this check and expands the report
wording to cover it.
--
Daniel Gustafsson
Attachments:
0001-Find-invalid-databases-during-upgrade-check-stage.patchapplication/octet-stream; name=0001-Find-invalid-databases-during-upgrade-check-stage.patch; x-unix-mode=0644Download
From 3608e477cd5d03074f8d5b0fa9d2890cb6944ca5 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Fri, 25 Oct 2024 13:45:28 +0200
Subject: [PATCH] Find invalid databases during upgrade check stage
Before continuing with the check start by checking that all databases
allow connections to avoid a hard fail without proper error reporting.
Inspired by a larger patch by Thomas Krennwallner.
Discussion: https://postgr.es/m/f9315bf0-e03e-4490-9f0d-5b6f7a6d9908@postsubmeta.net
---
src/bin/pg_upgrade/check.c | 36 ++++++++++++++++----------
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 12735a4268..2fce686530 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -18,7 +18,7 @@
static void check_new_cluster_is_empty(void);
static void check_is_install_user(ClusterInfo *cluster);
-static void check_proper_datallowconn(ClusterInfo *cluster);
+static void check_for_connection_status(ClusterInfo *cluster);
static void check_for_prepared_transactions(ClusterInfo *cluster);
static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
@@ -592,6 +592,12 @@ check_and_dump_old_cluster(void)
if (!user_opts.live_check)
start_postmaster(&old_cluster, true);
+ /*
+ * First check that all databases allow connections since we'll otherwise
+ * fail in later stages.
+ */
+ check_for_connection_status(&old_cluster);
+
/*
* Extract a list of databases, tables, and logical replication slots from
* the old cluster.
@@ -607,7 +613,6 @@ check_and_dump_old_cluster(void)
* Check for various failure cases
*/
check_is_install_user(&old_cluster);
- check_proper_datallowconn(&old_cluster);
check_for_prepared_transactions(&old_cluster);
check_for_isn_and_int8_passing_mismatch(&old_cluster);
@@ -1089,14 +1094,14 @@ check_is_install_user(ClusterInfo *cluster)
/*
- * check_proper_datallowconn
+ * check_for_connection_status
*
* Ensure that all non-template0 databases allow connections since they
* otherwise won't be restored; and that template0 explicitly doesn't allow
* connections since it would make pg_dumpall --globals restore fail.
*/
static void
-check_proper_datallowconn(ClusterInfo *cluster)
+check_for_connection_status(ClusterInfo *cluster)
{
int dbnum;
PGconn *conn_template1;
@@ -1104,6 +1109,7 @@ check_proper_datallowconn(ClusterInfo *cluster)
int ntups;
int i_datname;
int i_datallowconn;
+ int i_datconnlimit;
FILE *script = NULL;
char output_path[MAXPGPATH];
@@ -1111,23 +1117,25 @@ check_proper_datallowconn(ClusterInfo *cluster)
snprintf(output_path, sizeof(output_path), "%s/%s",
log_opts.basedir,
- "databases_with_datallowconn_false.txt");
+ "databases_cannot_connect_to.txt");
conn_template1 = connectToServer(cluster, "template1");
/* get database names */
dbres = executeQueryOrDie(conn_template1,
- "SELECT datname, datallowconn "
+ "SELECT datname, datallowconn, datconnlimit "
"FROM pg_catalog.pg_database");
i_datname = PQfnumber(dbres, "datname");
i_datallowconn = PQfnumber(dbres, "datallowconn");
+ i_datconnlimit = PQfnumber(dbres, "datconnlimit");
ntups = PQntuples(dbres);
for (dbnum = 0; dbnum < ntups; dbnum++)
{
char *datname = PQgetvalue(dbres, dbnum, i_datname);
char *datallowconn = PQgetvalue(dbres, dbnum, i_datallowconn);
+ char *datconnlimit = PQgetvalue(dbres, dbnum, i_datconnlimit);
if (strcmp(datname, "template0") == 0)
{
@@ -1139,10 +1147,10 @@ check_proper_datallowconn(ClusterInfo *cluster)
else
{
/*
- * avoid datallowconn == false databases from being skipped on
- * restore
+ * Avoid datallowconn == false databases from being skipped on
+ * restore, and ensure that no databases are marked invalid (-2).
*/
- if (strcmp(datallowconn, "f") == 0)
+ if ((strcmp(datallowconn, "f") == 0) || strcmp(datconnlimit, "-2") == 0)
{
if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
pg_fatal("could not open file \"%s\": %m", output_path);
@@ -1161,11 +1169,11 @@ check_proper_datallowconn(ClusterInfo *cluster)
fclose(script);
pg_log(PG_REPORT, "fatal");
pg_fatal("All non-template0 databases must allow connections, i.e. their\n"
- "pg_database.datallowconn must be true. Your installation contains\n"
- "non-template0 databases with their pg_database.datallowconn set to\n"
- "false. Consider allowing connection for all non-template0 databases\n"
- "or drop the databases which do not allow connections. A list of\n"
- "databases with the problem is in the file:\n"
+ "pg_database.datallowconn must be true and pg_database.datconnlimit\n"
+ "must not be -2. Your installation contains non-template0 databases\n"
+ "which cannot be connected to. Consider allowing connection for all\n"
+ "non-template0 databases or drop the databases which do not allow\n"
+ "connections. A list of databases with the problem is in the file:\n"
" %s", output_path);
}
else
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..9b51f9e666 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -424,7 +424,7 @@ SKIP:
$mode, '--check',
],
1,
- [qr/invalid/], # pg_upgrade prints errors on stdout :(
+ [qr/datconnlimit/],
[qr/^$/],
'invalid database causes failure');
rmtree($newnode->data_dir . "/pg_upgrade_output.d");
--
2.39.3 (Apple Git-146)
On Fri, Oct 25, 2024 at 01:55:57PM +0200, Daniel Gustafsson wrote:
On 14 Oct 2024, at 18:57, Bruce Momjian <bruce@momjian.us> wrote:
What might be acceptable would be to add an option that would make
pg_upgrade more tolerant of problems in many areas, that is a lot more
research and discussion.I agree that the concept of having pg_upgrade perform (opt-in) skipping and/or
repairs of the old cluster warrants a larger discussion in its own thread.
There has been significant amount of energy spent recently to add structure to
the checks, any new feature should be properly designed for the get-go.In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
fails hard on invalid databases instead of detecting and reporting like how
other errors are handled. The attached adds this check and expands the report
wording to cover it.
Agreed.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"
On 1 Nov 2024, at 01:36, Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Oct 25, 2024 at 01:55:57PM +0200, Daniel Gustafsson wrote:
In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
fails hard on invalid databases instead of detecting and reporting like how
other errors are handled. The attached adds this check and expands the report
wording to cover it.Agreed.
I've applied this part, the discussion on whether or not pg_upgrade should gain
capabilities to skip and/or fix issues should probably be carried over in a new
thread.
--
Daniel Gustafsson