locale

Started by Dennis Bjorklundabout 22 years ago35 messageshackers
Jump to latest
#1Dennis Bjorklund
db@zigo.dhs.org

Is anyone working to make the locale support in pg better? Running initdb
to set the locale is a bit heavy. It would be nice to at least be able to
set it per database.

--
/Dennis Bj�rklund

#2Bruce Momjian
bruce@momjian.us
In reply to: Dennis Bjorklund (#1)
Re: locale

Dennis Bjorklund wrote:

Is anyone working to make the locale support in pg better? Running initdb
to set the locale is a bit heavy. It would be nice to at least be able to
set it per database.

Uh, createdb and CREATE DATABASE both have encoding options. initdb
only sets the encoding for template1, and the default for future
databases, but you can override it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Bruce Momjian
bruce@momjian.us
In reply to: Dennis Bjorklund (#1)
Re: locale

Dennis Bjorklund wrote:

Is anyone working to make the locale support in pg better? Running initdb
to set the locale is a bit heavy. It would be nice to at least be able to
set it per database.

Oops, I confused locale and multibyte. Yes, I don't see a way to change
locale for new databases, but I don't see why we can't.

The initdb manual page says:

initdb initializes the database cluster's default locale
and character set encoding. Some locale categories are
fixed for the lifetime of the cluster, so it is important
to make the right choice when running initdb. Other
locale categories can be changed later when the server is
started. initdb will write those locale settings into the
postgresql.conf configuration file so they are the
default, but they can be changed by editing that file. To
set the locale that initdb uses, see the description of
the --locale option. The character set encoding can be set
separately for each database as it is created. initdb
determines the encoding for the template1 database, which
will serve as the default for all other databases. To
alter the default encoding use the --encoding option.

and

--locale=locale
Sets the default locale for the database cluster.
If this option is not specified, the locale is
inherited from the environment that initdb runs in.

--lc-collate=locale

--lc-ctype=locale

--lc-messages=locale

--lc-monetary=locale

--lc-numeric=locale

--lc-time=locale
Like --locale, but only sets the locale in the
specified category.

My only guess is that you can use ALTER DATABASE SET to set some of the
values when someone connects to the database.

Looking at guc.c I see:

{
{"lc_collate", PGC_INTERNAL, CLIENT_CONN_LOCALE,
gettext_noop("Shows the collation order locale."),
NULL,
GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
},
&locale_collate,
"C", NULL, NULL
},

{
{"lc_ctype", PGC_INTERNAL, CLIENT_CONN_LOCALE,
gettext_noop("Shows the character classification and case conversion locale."),
NULL,
GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
},
&locale_ctype,
"C", NULL, NULL
},

{
{"lc_messages", PGC_SUSET, CLIENT_CONN_LOCALE,
gettext_noop("Sets the language in which messages are displayed."),
NULL
},
&locale_messages,
"", locale_messages_assign, NULL
},

{
{"lc_monetary", PGC_USERSET, CLIENT_CONN_LOCALE,
gettext_noop("Sets the locale for formatting monetary amounts."),
NULL
},
&locale_monetary,
"C", locale_monetary_assign, NULL
},

{
{"lc_numeric", PGC_USERSET, CLIENT_CONN_LOCALE,
gettext_noop("Sets the locale for formatting numbers."),
NULL
},
&locale_numeric,
"C", locale_numeric_assign, NULL
},

{
{"lc_time", PGC_USERSET, CLIENT_CONN_LOCALE,
gettext_noop("Sets the locale for formatting date and time values."),
NULL
},
&locale_time,
"C", locale_time_assign, NULL
},

You can't change the internal ones, but you can modify some of the others.

Anyone know why we don't allow locale to be set per database?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Peter Eisentraut
peter_e@gmx.net
In reply to: Dennis Bjorklund (#1)
Re: locale

Dennis Bjorklund wrote:

Is anyone working to make the locale support in pg better? Running
initdb to set the locale is a bit heavy. It would be nice to at least
be able to set it per database.

I was supposed to do that but I got distracted. I send out a longish
implementation and transition plan some time ago, if you're interested.
Setting the locale per database is quite doable actually, you just need
a plan to prevent corruption of the shared system catalogs and you need
to deal with modifications of the template database(s). There was some
discussion about that as well. See the thread "Translations in the
distributions" around 2004-01-09. I can help out if you want to do
what was discussed there.

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#2)
Re: locale

Bruce Momjian wrote:

Dennis Bjorklund wrote:

Is anyone working to make the locale support in pg better? Running initdb
to set the locale is a bit heavy. It would be nice to at least be able to
set it per database.

Uh, createdb and CREATE DATABASE both have encoding options. initdb
only sets the encoding for template1, and the default for future
databases, but you can override it.

That is true for encoding, but not true for LC_CTYPE and LC_COLLATE
locale settings, which only initdb can set.

cheers

andrew

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: locale

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Anyone know why we don't allow locale to be set per database?

Changing it on the fly would corrupt index sort ordering. See also
Peter's response nearby.

regards, tom lane

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: locale

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Anyone know why we don't allow locale to be set per database?

Changing it on the fly would corrupt index sort ordering. See also
Peter's response nearby.

I was asking why we can't set it to a new static value when we create
the database. I don't think the poster was asking for the ability to
change it after the database was created.

Added to TODO:

* Allow locale to be set at database creation

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: locale

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I was asking why we can't set it to a new static value when we create
the database.

Because that would corrupt indexes on shared tables. (It might be
possible to finesse that, but it's not a no-brainer.)

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: locale

I said:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I was asking why we can't set it to a new static value when we create
the database.

Because that would corrupt indexes on shared tables. (It might be
possible to finesse that, but it's not a no-brainer.)

And even more to the point, it would corrupt non-shared indexes
inherited from template1. This could not be finessed --- AFAICS you'd
need to do the equivalent of a REINDEX in the new database to make it
work.

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: locale

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I was asking why we can't set it to a new static value when we create
the database.

Because that would corrupt indexes on shared tables. (It might be
possible to finesse that, but it's not a no-brainer.)

Oh, I hadn't thought of that. The problem isn't encoding, because we
handle that already, but differen representations of time and stuff?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: locale

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Because that would corrupt indexes on shared tables. (It might be
possible to finesse that, but it's not a no-brainer.)

Oh, I hadn't thought of that. The problem isn't encoding, because we
handle that already, but differen representations of time and stuff?

No, the problem is sort ordering of indexes on textual columns.

regards, tom lane

#12Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#9)
Re: locale

On Wed, 7 Apr 2004, Tom Lane wrote:

And even more to the point, it would corrupt non-shared indexes
inherited from template1. This could not be finessed --- AFAICS you'd
need to do the equivalent of a REINDEX in the new database to make it
work.

From what I can tell there is only 3 tables we talk about:

pg_database
pg_shadow
pg_group

and in each case there is the name column that is indexed (that matters to
us, int columns are the same no matter what locale).

These name columns all use the special name datatype, maybe one could
simply treat name differently, like comparing these strings bytewise.

For my small databases I don't even need an index on any of these. But I
can imaging someone having a couple of thousand users.

--
/Dennis Bj�rklund

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: locale

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Added to TODO:
* Allow locale to be set at database creation

BTW, that is redundant with the locale todo items already present.

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#12)
Re: locale

Dennis Bjorklund <db@zigo.dhs.org> writes:

From what I can tell there is only 3 tables we talk about:
pg_database
pg_shadow
pg_group

If that were so, we'd not have a problem. The reason we have to tread
very carefully is that we do not know what tables/indexes users might
have added to template1. If we copy a text index into a new database
and claim that it is sorted by some new locale, we'd be breaking things.

In any case, the whole idea is substantially inferior to the correct
solution, which is per-column locale settings within databases. That
does what we want, is required functionality per SQL spec, and avoids
problems during CREATE DATABASE. It's just a tad harder to do :-(

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#13)
Re: locale

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Added to TODO:
* Allow locale to be set at database creation

BTW, that is redundant with the locale todo items already present.

I see:

* Allow locale to be set at database creation
* Allow locale on a per-column basis, default to ASCII

The first seems easier than the second.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#16Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#14)
Re: locale

On Wed, 7 Apr 2004, Tom Lane wrote:

If that were so, we'd not have a problem. The reason we have to tread
very carefully is that we do not know what tables/indexes users might
have added to template1.

Aah, now I see the real problem!

If we copy a text index into a new database and claim that it is sorted
by some new locale, we'd be breaking things.

How is this handled for encodings? You can very well have something in
template1 in an encoding that is not compatible with the encoding you use
to create a new database.

Right now I can't imagine how that was solved.

In any case, the whole idea is substantially inferior to the correct
solution, which is per-column locale settings within databases.

Of course, but that solution might be many years ahead. Had it been fairly
easy to create a database with a different locale it would have been
worth it (and still is if one could come up with some solution).

I have a number of different data directories with different locales, and
add to that a number of different versions of pg and you can imagine
what it looks like when I run ps :-)

--
/Dennis Bj�rklund

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: locale

Tom Lane wrote:

Dennis Bjorklund <db@zigo.dhs.org> writes:

From what I can tell there is only 3 tables we talk about:
pg_database
pg_shadow
pg_group

If that were so, we'd not have a problem. The reason we have to tread
very carefully is that we do not know what tables/indexes users might
have added to template1. If we copy a text index into a new database
and claim that it is sorted by some new locale, we'd be breaking things.

Wouldn't reindex correct that? If so, it could be forced with a flag on
"create database" maybe, or else some test to compare the two locale
settings and force it if necessary?

In any case, the whole idea is substantially inferior to the correct
solution, which is per-column locale settings within databases. That
does what we want, is required functionality per SQL spec, and avoids
problems during CREATE DATABASE. It's just a tad harder to do :-(

Yeah. But everything higher than the table level can surely be finessed
with differrent locations / databases. Not having this right (i.e. at
the column level) is a great pity, to say the least.

cheers

andrew

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#16)
Re: locale

Dennis Bjorklund <db@zigo.dhs.org> writes:

On Wed, 7 Apr 2004, Tom Lane wrote:

If we copy a text index into a new database and claim that it is sorted
by some new locale, we'd be breaking things.

How is this handled for encodings? You can very well have something in
template1 in an encoding that is not compatible with the encoding you use
to create a new database.

This is likely broken; but that's no excuse for creating similar
breakage for locale settings. Note that Peter's planned project would
hopefully clean up both of these issues.

In practice, we know that we have seen index failures from altering the
locale settings (back before we installed the code that locks down
LC_COLLATE/LC_CTYPE at initdb time). I do not recall having heard any
reports of index failures that could be traced to changing encoding.
This may be because strcoll() derives its assumptions about encoding
from the LC_CTYPE setting and doesn't actually know what PG thinks the
encoding is. So you might have a stored string that is illegal per the
current encoding, but nonetheless it will sort the same as it did in the
mother database.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#16)
Re: locale

Dennis Bjorklund <db@zigo.dhs.org> writes:

On Wed, 7 Apr 2004, Tom Lane wrote:

In any case, the whole idea is substantially inferior to the correct
solution, which is per-column locale settings within databases.

Of course, but that solution might be many years ahead.

Peter E. seems to think that it's not an infeasible amount of work.
(See previous discussion that he mentioned earlier in this thread.)

Basically, I'd rather see us tackle that than expend effort on
kluging CREATE DATABASE to allow per-database locales.

regards, tom lane

#20Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#19)
Re: locale

On Wed, 7 Apr 2004, Tom Lane wrote:

solution, which is per-column locale settings within databases.

Of course, but that solution might be many years ahead.

Peter E. seems to think that it's not an infeasible amount of work.
(See previous discussion that he mentioned earlier in this thread.)

I don't know how it should work in theory yet, much less what an
implementation would look like.

What happens when you have two columns with different locales and try to
compare them with with the operator <. Is the locale part of the string
type, like text@sv_SE.UTF-8. What does that do to overloaded functions.
What would happen when a locale and an encoding does not match. Should one
just assume that it wont happen.

I've got lots of questions like that, some are probably answered by the
sql standard and others maybe don't have an answer.

Basically, I'd rather see us tackle that than expend effort on
kluging CREATE DATABASE to allow per-database locales.

Don't think for a second that I don't want this. You are an american that
live in a ASCII world and you wants this. You can not imagine how much I
want it :-)

--
/Dennis Bj�rklund

#21Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Dennis Bjorklund (#16)
#22Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tatsuo Ishii (#21)
#23Honza Pazdziora
adelton@informatics.muni.cz
In reply to: Tom Lane (#18)
#24Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Dennis Bjorklund (#22)
#25Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tatsuo Ishii (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#25)
#27Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#27)
#29Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#29)
#31Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dennis Bjorklund (#31)
#33Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#32)
#34Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#26)
#35Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#34)