Changing collate & ctype for an existing database

Started by rihadover 8 years ago7 messagesgeneral
Jump to latest
#1rihad
rihad@mail.ru

Hi there. We have a working database that was unfortunately created by
initdb with default ("C") collation & ctype. All other locale specific
settings have the value en_US.UTF-8 in postgresql.conf. The database
itself is multilingual and all its data is stored in UTF-8. Sorting
doesn't work correctly, though. To fix that, can I just do this:

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='mydb';

This does seem to work on a testing copy of the database, i.e. select
lower('БлаБлаБла') now works correctly when connected to that database.

Is there still any chance for corrupting data by doing this, or indexes
stopping working etc?

p.s.: postgres 9.6.3

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: rihad (#1)
Re: Changing collate & ctype for an existing database

rihad <rihad@mail.ru> writes:

Hi there. We have a working database that was unfortunately created by
initdb with default ("C") collation & ctype. All other locale specific
settings have the value en_US.UTF-8 in postgresql.conf. The database
itself is multilingual and all its data is stored in UTF-8. Sorting
doesn't work correctly, though. To fix that, can I just do this:

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='mydb';

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong). If you can reindex
them before doing anything more with the database, you'd be ok
... I think. Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3rihad
rihad@mail.ru
In reply to: Tom Lane (#2)
Re: Changing collate & ctype for an existing database

On 07/10/2017 08:42 PM, Tom Lane wrote:

rihad <rihad@mail.ru> writes:

Hi there. We have a working database that was unfortunately created by
initdb with default ("C") collation & ctype. All other locale specific
settings have the value en_US.UTF-8 in postgresql.conf. The database
itself is multilingual and all its data is stored in UTF-8. Sorting
doesn't work correctly, though. To fix that, can I just do this:
update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='mydb';

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong). If you can reindex
them before doing anything more with the database, you'd be ok
... I think. Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

regards, tom lane

Thank you, Tom. But can I still do it for the template1 database?

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='template1';

It's empty, only hosting a few extensions. Now I can't even create a
database having a different collation:

$ createdb -O myuser --locale='en_US.UTF-8' mydb
createdb: database creation failed: ERROR: new collation (en_US.UTF-8)
is incompatible with the collation of the template database (C)
HINT: Use the same collation as in the template database, or use
template0 as template.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: rihad (#1)
Re: Changing collate & ctype for an existing database

rihad <rihad@mail.ru> writes:

On 07/10/2017 08:42 PM, Tom Lane wrote:

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong). If you can reindex
them before doing anything more with the database, you'd be ok
... I think. Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

Thank you, Tom. But can I still do it for the template1 database?

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='template1';

It'd be safe to do it on template0, and also on template1 as long as that
has only the original contents ...

It's empty, only hosting a few extensions.

... which that isn't. I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
FROM (SELECT indexrelid, indrelid,
unnest(indclass) as iclass, unnest(indcollation) as icoll
FROM pg_index) ss
WHERE icoll != 0 AND iclass !=
(SELECT oid FROM pg_opclass
WHERE opcname = 'text_pattern_ops' AND opcmethod =
(SELECT oid FROM pg_am WHERE amname = 'btree'));

Now I can't even create a database having a different collation:
$ createdb -O myuser --locale='en_US.UTF-8' mydb
createdb: database creation failed: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (C)
HINT: Use the same collation as in the template database, or use template0 as template.

Read the HINT ...

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5rihad
rihad@mail.ru
In reply to: Tom Lane (#4)
Re: Changing collate & ctype for an existing database

On 07/10/2017 11:07 PM, Tom Lane wrote:

rihad <rihad@mail.ru> writes:

On 07/10/2017 08:42 PM, Tom Lane wrote:

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong). If you can reindex
them before doing anything more with the database, you'd be ok
... I think. Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

Thank you, Tom. But can I still do it for the template1 database?
update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='template1';

It'd be safe to do it on template0, and also on template1 as long as that
has only the original contents ...

It's empty, only hosting a few extensions.

... which that isn't. I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
FROM (SELECT indexrelid, indrelid,
unnest(indclass) as iclass, unnest(indcollation) as icoll
FROM pg_index) ss
WHERE icoll != 0 AND iclass !=
(SELECT oid FROM pg_opclass
WHERE opcname = 'text_pattern_ops' AND opcmethod =
(SELECT oid FROM pg_am WHERE amname = 'btree'));

I ran the query on our production database. Zero results. Do I have the
green light to

set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'

for all our working databases? :) Or for template0 & template1 only?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: rihad (#5)
Re: Changing collate & ctype for an existing database

rihad <rihad@mail.ru> writes:

On 07/10/2017 11:07 PM, Tom Lane wrote:

... which that isn't. I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

I ran the query on our production database. Zero results.

Really? You have no indexes on textual columns? That seems surprising.

Do I have the green light to
set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'

Well, I'd double-check that result, but I suppose you can always reindex
later if you find you missed something.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7rihad
rihad@mail.ru
In reply to: Tom Lane (#6)
Re: Changing collate & ctype for an existing database

On 07/11/2017 12:45 AM, Tom Lane wrote:

rihad <rihad@mail.ru> writes:

On 07/10/2017 11:07 PM, Tom Lane wrote:

... which that isn't. I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

I ran the query on our production database. Zero results.

Really? You have no indexes on textual columns? That seems surprising.

Oops, of course we do, around 10-15 per db. I was initially connected to
the postgres database when I ran the query, I thought the query you gave
me was global by looking at it.

So, deciding NOT to reindex all of them risks the corruption of their
relevant tables?
It could be easier to simply drop and restore the db, albeit with some
downtime.
Thank you so much for you help.

Do I have the green light to
set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'

Well, I'd double-check that result, but I suppose you can always reindex
later if you find you missed something.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general