Preserve versions of initdb-created collations in pg_upgrade

Started by Peter Eisentrautabout 6 years ago5 messages
#1Peter Eisentraut
peter.eisentraut@2ndquadrant.com
1 attachment(s)

As mentioned in [0]/messages/by-id/CA+hUKGKDe98DFWKJoS7e4Z+Oamzc-1sZfpL3V3PPgi1uNvQ1tw@mail.gmail.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services, pg_upgrade currently does not preserve the version
of collation objects created by initdb. Here is an attempt to fix that.

The way I deal with this here is by having the binary-upgrade mode in
pg_dump delete all the collations created by initdb and then dump out
CREATE COLLATION commands with version information normally.

I had originally imagined doing some kind of ALTER COLLATION (or perhaps
a direct UPDATE pg_collation) to update the version information, but
that doesn't really work because we don't know whether the collation
object with a given name in the new cluster is the same as the one in
the old cluster. So it seems more robust to just delete all existing
collations and create them from scratch.

Thoughts?

[0]: /messages/by-id/CA+hUKGKDe98DFWKJoS7e4Z+Oamzc-1sZfpL3V3PPgi1uNvQ1tw@mail.gmail.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
/messages/by-id/CA+hUKGKDe98DFWKJoS7e4Z+Oamzc-1sZfpL3V3PPgi1uNvQ1tw@mail.gmail.com
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v1-0001-Preserve-versions-of-initdb-created-collations-in.patchtext/plain; charset=UTF-8; name=v1-0001-Preserve-versions-of-initdb-created-collations-in.patch; x-mac-creator=0; x-mac-type=0Download
From f4e804b3c1ccc7c36fe8d253ae74955ab1855448 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 28 Oct 2019 13:38:56 +0100
Subject: [PATCH v1] Preserve versions of initdb-created collations in
 pg_upgrade

---
 src/bin/pg_dump/pg_dump.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bf69adc2f4..f54ea2e73e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1649,6 +1649,28 @@ selectDumpableCast(CastInfo *cast, Archive *fout)
 			DUMP_COMPONENT_ALL : DUMP_COMPONENT_NONE;
 }
 
+/*
+ * selectDumpableCollation: policy-setting subroutine
+ *		Mark a collation as to be dumped or not
+ */
+static void
+selectDumpableCollation(CollInfo *collinfo, Archive *fout)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (checkExtensionMembership(&collinfo->dobj, fout))
+		return;					/* extension membership overrides all else */
+
+	collinfo->dobj.dump = collinfo->dobj.namespace->dobj.dump_contains;
+
+	/*
+	 * Collations in pg_catalog created by initdb must be dumped in
+	 * binary-upgrade mode to preserve the collation version.
+	 */
+	if (dopt->binary_upgrade && collinfo->dobj.catId.oid >= FirstBootstrapObjectId)
+			collinfo->dobj.dump = DUMP_COMPONENT_ALL;
+}
+
 /*
  * selectDumpableProcLang: policy-setting subroutine
  *		Mark a procedural language as to be dumped or not
@@ -2980,6 +3002,10 @@ dumpDatabase(Archive *fout)
 						  frozenxid, minmxid);
 		appendStringLiteralAH(creaQry, datname, fout);
 		appendPQExpBufferStr(creaQry, ";\n");
+
+		appendPQExpBufferStr(creaQry, "\n-- Remove collations created by initdb.\n");
+		appendPQExpBuffer(creaQry, "DELETE FROM pg_catalog.pg_collation WHERE oid > %u;\n",
+						  FirstBootstrapObjectId);
 	}
 
 	if (creaQry->len > 0)
@@ -5107,7 +5133,8 @@ getCollations(Archive *fout, int *numCollations)
 	appendPQExpBuffer(query, "SELECT tableoid, oid, collname, "
 					  "collnamespace, "
 					  "(%s collowner) AS rolname "
-					  "FROM pg_collation",
+					  "FROM pg_collation "
+					  "WHERE collencoding = (SELECT encoding FROM pg_database WHERE datname = current_database()) OR collencoding = -1",
 					  username_subquery);
 
 	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
@@ -5136,7 +5163,7 @@ getCollations(Archive *fout, int *numCollations)
 		collinfo[i].rolname = pg_strdup(PQgetvalue(res, i, i_rolname));
 
 		/* Decide whether we want to dump it */
-		selectDumpableObject(&(collinfo[i].dobj), fout);
+		selectDumpableCollation(&collinfo[i], fout);
 
 		/* Collations do not currently have ACLs. */
 		collinfo[i].dobj.dump &= ~DUMP_COMPONENT_ACL;

base-commit: 61ecea45e50bcd3b87d4e905719e63e41d6321ce
-- 
2.23.0

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#1)
Re: Preserve versions of initdb-created collations in pg_upgrade

On Tue, Oct 29, 2019 at 1:52 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

As mentioned in [0], pg_upgrade currently does not preserve the version
of collation objects created by initdb. Here is an attempt to fix that.

The way I deal with this here is by having the binary-upgrade mode in
pg_dump delete all the collations created by initdb and then dump out
CREATE COLLATION commands with version information normally.

This seems to be basically OK.

It does mean that the target database has collation OIDs >=
FirstNormalObjectId. That is, they don't look like initdb-created
objects, which is OK because they aren't, I'm just highlighting this
to see if anyone else sees a problem with it. Suppose you pg_upgrade
again: now you'll dump these collations just as you did the first time
around, because they look exactly like user-defined collations. It
also means that if you pg_upgrade to a target cluster created by a
build without ICU we'll try to create ICU collations and that'll fail
("ICU is not supported in this build"), whereas before if had ICU
collations and didn't ever make use of them, you'd be able to do such
an upgrade; again this doesn't seem like a major problem, it's just an
observation about an edge case. One more thing to note is if you
upgrade from 12 to 13 on a glibc system, I think we'll automatically
pick up the *current* version when creating the collations in the target
DB, which seems to be OK but it is a choice to default to assuming
that the database's indexes are not corrupted. Another observation is
that you finish up with different OIDs in each database you upgrade,
which again doesn't seem like a problem in itself. It is slightly odd that
template1 finishes up with the old initdb's template1 collatoins, rather
than the new initdb's opinion of the current set of collations, but I am
not sure if it's a problem. I think it has to be like that, because you
might have created other stuff that depends on those collations in your
source template1 database, and so you have to preserve the versions.

I had originally imagined doing some kind of ALTER COLLATION (or perhaps
a direct UPDATE pg_collation) to update the version information, but
that doesn't really work because we don't know whether the collation
object with a given name in the new cluster is the same as the one in
the old cluster. So it seems more robust to just delete all existing
collations and create them from scratch.

Thoughts?

Seems to work as described with -E UTF-8, but it fails with clusters
using -E SQL_ASCII. That causes the pg_upgrade check to fail on
machines where that is the default encoding chosen by initdb (where
unpatched master succeeds):

pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro
pg_restore: error: could not execute query: ERROR: collation
"pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist
Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;

#3Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Thomas Munro (#2)
Re: Preserve versions of initdb-created collations in pg_upgrade

On 2019-10-29 03:33, Thomas Munro wrote:

Seems to work as described with -E UTF-8, but it fails with clusters
using -E SQL_ASCII. That causes the pg_upgrade check to fail on
machines where that is the default encoding chosen by initdb (where
unpatched master succeeds):

pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro
pg_restore: error: could not execute query: ERROR: collation
"pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist
Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;

This could be addressed by using is_encoding_supported_by_icu() in
pg_dump to filter out collations with unsupported encodings.

However, the more I look at this whole problem, I'm wondering whether it
wouldn't be preferable to avoid this whole mess by just not creating any
collations in initdb. What do you think?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#3)
Re: Preserve versions of initdb-created collations in pg_upgrade

On Sat, Dec 21, 2019 at 7:38 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2019-10-29 03:33, Thomas Munro wrote:

Seems to work as described with -E UTF-8, but it fails with clusters
using -E SQL_ASCII. That causes the pg_upgrade check to fail on
machines where that is the default encoding chosen by initdb (where
unpatched master succeeds):

pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro
pg_restore: error: could not execute query: ERROR: collation
"pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist
Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;

This could be addressed by using is_encoding_supported_by_icu() in
pg_dump to filter out collations with unsupported encodings.

However, the more I look at this whole problem, I'm wondering whether it
wouldn't be preferable to avoid this whole mess by just not creating any
collations in initdb. What do you think?

I think this problem goes away if we commit the per-object collation
version patch set[1]/messages/by-id/CAEepm=0uEQCpfq_+LYFBdArCe4Ot98t1aR4eYiYTe=yavQygiQ@mail.gmail.com. It drops the collversion column, and Julien's
recent versions handle pg_upgrade quite well, as long as a collation
by the same name exists in the target cluster. In that universe, if
initdb didn't create them, we'd have to tell people to create all
necessary collations manually before doing a pg_upgrade into it, and
that doesn't seem great. Admittedly there might be some weird cases
where a collation is somehow completely different but has the same
name.

[1]: /messages/by-id/CAEepm=0uEQCpfq_+LYFBdArCe4Ot98t1aR4eYiYTe=yavQygiQ@mail.gmail.com

#5Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Thomas Munro (#4)
Re: Preserve versions of initdb-created collations in pg_upgrade

On 2019-12-21 09:01, Thomas Munro wrote:

I think this problem goes away if we commit the per-object collation
version patch set[1]. It drops the collversion column, and Julien's
recent versions handle pg_upgrade quite well, as long as a collation
by the same name exists in the target cluster. In that universe, if
initdb didn't create them, we'd have to tell people to create all
necessary collations manually before doing a pg_upgrade into it, and
that doesn't seem great. Admittedly there might be some weird cases
where a collation is somehow completely different but has the same
name.

Setting this patch to Returned with Feedback.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services