pg_upgrade, locale and encoding

Started by Heikki Linnakangasover 11 years ago2 messageshackers
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

While looking at bug #11431, I noticed that pg_upgrade still seems to
think that encoding and locale are cluster-wide properties. We got
per-database locale support in 8.4, and encoding has been per-database
much longer than that.

pg_upgrade checks the encoding and locale of template0 in both clusters,
and throws an error if they don't match. But it doesn't check the locale
or encoding of postgres or template1 databases. That leads to problems
if e.g. the postgres database was dropped and recreated with a different
encoding or locale in the old cluster. We will merrily upgrade it, but
strings in the database will be incorrectly encoded.

I propose the attached patch, for git master. It's more complicated in
back-branches, as they still support upgrading from pre-8.4 clusters. We
haven't heard any complaints from the field on this, so I don't think
it's worth trying to back-patch this.

This slightly changes the way the locale comparison works. First, it
ignores the encoding suffix of the locale name. It's of course important
that the databases have a compatible encoding, but pg_database has a
separate field for encoding, and that's now compared directly. Secondly,
it tries to canonicalize the names, by calling setlocale(). That seems
like a good idea, in response to bug #11431
(/messages/by-id/5424090E.9060700@vmware.com).

- Heikki

Attachments:

0001-In-pg_upgrade-check-the-encoding-and-locale-of-templ.patchtext/x-diff; name=0001-In-pg_upgrade-check-the-encoding-and-locale-of-templ.patchDownload+85-174
#2Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#1)
Re: pg_upgrade, locale and encoding

On Tue, Oct 7, 2014 at 03:52:24PM +0300, Heikki Linnakangas wrote:

While looking at bug #11431, I noticed that pg_upgrade still seems
to think that encoding and locale are cluster-wide properties. We
got per-database locale support in 8.4, and encoding has been
per-database much longer than that.

pg_upgrade checks the encoding and locale of template0 in both
clusters, and throws an error if they don't match. But it doesn't
check the locale or encoding of postgres or template1 databases.
That leads to problems if e.g. the postgres database was dropped and
recreated with a different encoding or locale in the old cluster. We
will merrily upgrade it, but strings in the database will be
incorrectly encoded.

Wow, I never thought someone would do that, but they certainly could ---
good catch.

I propose the attached patch, for git master. It's more complicated
in back-branches, as they still support upgrading from pre-8.4
clusters. We haven't heard any complaints from the field on this, so
I don't think it's worth trying to back-patch this.

Agreed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers