locales and encodings Oh MY!
Hi all,
So today while I investigating some slow queries I found out that some time
ago some of our databases (due, innocently I suspect, to upgrades or
hardware migrations) are now not using C as lc_ctype. And why some 'things'
got a lot slower...
To correct this problem it seems like I am going to have to do a dump,
intidb, then restore. (or add some new indexes and eat the extra overhead).
Unless someone knows something I don't about changing ctype and collation
with some sleight of hand (if you do I am very interested)
I have listed what I think I will be doing with regards to initdb. if anyone
sees problems with the following mixture during my dump -> init-> restore I
would be most keen in hearing about it.
initdb /path/to/data/dir --lc_ctype=C --lc_collation=C
--lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8
--lc_time=en_US.UTF8 -E UTF8
Thank you,
Mark
Hi Mark,
Il 12/11/10 03:31, mark ha scritto:
I have listed what I think I will be doing with regards to initdb. if anyone
sees problems with the following mixture during my dump -> init-> restore I
would be most keen in hearing about it.
FYI, PostgreSQL 8.4 introduced database level collation. Before that,
yes, you need to re-init your data dir.
initdb /path/to/data/dir --lc_ctype=C --lc_collation=C
--lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8
--lc_time=en_US.UTF8 -E UTF8
Maybe you meant --lc_collate ?
Cheers,
Gabriele
--
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it
On Fri, Nov 12, 2010 at 12:45 AM, Gabriele Bartolini
<gabriele.bartolini@2ndquadrant.it> wrote:
Hi Mark,
Il 12/11/10 03:31, mark ha scritto:
I have listed what I think I will be doing with regards to initdb. if
anyone
sees problems with the following mixture during my dump -> init->
restore I
would be most keen in hearing about it.FYI, PostgreSQL 8.4 introduced database level collation. Before that, yes,
you need to re-init your data dir.
Guess I missed that, I have 8.3.X boxes in production and 9.0.1 boxes
in dev so I guess only someone of them will require a re-init.
initdb /path/to/data/dir --lc_ctype=C --lc_collation=C
--lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8
--lc_time=en_US.UTF8 -E UTF8Maybe you meant --lc_collate ?
Yes I did mean lc_collate - thanks
Cheers,
Gabriele
With how similar straight C and en_US.UTF8 are it was suggested to me,
by persons who are far more C knowledgeable then I in my office, that
this is something the PG community could "fix" . A "fix" being so that
"col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8
(and probably some others).
@hackers ->
is the request unreasonable ? anyone got any idea of the price tag to
make that happen ?
Show quoted text
--
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it
mark wrote:
A "fix" being so that "col LIKE 'foo%' " could use btree indexes in
locales like en_US.UTF8 (and probably some others).
How about specifying an opclass?:
http://www.postgresql.org/docs/current/interactive/indexes-opclass.html
-Kevin
Import Notes
Resolved by subject fallback
With how similar straight C and en_US.UTF8 are it was suggested to me,
by persons who are far more C knowledgeable then I in my office, that
this is something the PG community could "fix" . A "fix" being so that
"col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8
(and probably some others).@hackers ->
is the request unreasonable ? anyone got any idea of the price tag to
make that happen ?
I thought it already did that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
With how similar straight C and en_US.UTF8 are it was suggested to me,
by persons who are far more C knowledgeable then I in my office, that
this is something the PG community could "fix" . A "fix" being so that
"col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8
(and probably some others).
is the request unreasonable ? anyone got any idea of the price tag to
make that happen ?
I thought it already did that.
No, and the odds of it ever happening are insignificant. The sort order
associated with en_US (and other "dictionary order" locales) is just too
randomly different from what you need to optimize a LIKE search.
(Whoever told you en_US sorts similarly to C is nuts.)
The solution if you want the database's prevailing sort order to be en_US
is to put an extra text_pattern_ops index on the column you want to do
LIKE searches on. We might eventually have the ability to spell that
"put a C-locale index on the column", but text_pattern_ops is the way to
do it today.
regards, tom lane
On Fri, Nov 12, 2010 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
With how similar straight C and en_US.UTF8 are it was suggested to me,
by persons who are far more C knowledgeable then I in my office, that
this is something the PG community could "fix" . A "fix" being so that
"col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8
(and probably some others).is the request unreasonable ? anyone got any idea of the price tag to
make that happen ?I thought it already did that.
No, and the odds of it ever happening are insignificant. The sort order
associated with en_US (and other "dictionary order" locales) is just too
randomly different from what you need to optimize a LIKE search.
(Whoever told you en_US sorts similarly to C is nuts.)The solution if you want the database's prevailing sort order to be en_US
is to put an extra text_pattern_ops index on the column you want to do
LIKE searches on. We might eventually have the ability to spell that
"put a C-locale index on the column", but text_pattern_ops is the way to
do it today.regards, tom lane
Ok I hear you loud and clear. I am going to eat the overhead until
I get to 9.0.1, currently on 8.3.X in some places.
I will either take an outage and do a dump - re-init-restore or
inplace upgrade and then do some locking, copy, drop old, rename new
db path.
thanks all.
..: Mark