Can LIKE under utf8 use INDEXes?
Hi. I'm confused about the behavior of LIKE under utf8 locale.
Accoding to the docs (
http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted
below), it seems that LIKE ignores locale and hence can't use indexes. Yet,
EXPLAIN clearly shows it using indexes.
The docs suggest a workaround, to allow LIKE to use indexes - but I couldn't
figure it out. Although I'm stuck with locale utf8, all my data is 7-bit
ascii. I'm doing a tremendous amount of WHERE x LIKE 'abc%' - what's the
best way to set up a good index?
(I can change the settings for this database - but the cluster must remain
utf8).
Thanks!
(Here is the doc excerpt, from
http://www.postgresql.org/docs/8.2/interactive/locale.html :
"The drawback of using locales other than C or POSIX in PostgreSQL is its
performance impact. It slows character handling and prevents ordinary
indexes from being used by LIKE. For this reason use locales only if you
actually need them. As a workaround to allow PostgreSQL to use indexes with
LIKE clauses under a non-C locale, several custom operator classes exist.
These allow the creation of an index that performs a strict
character-by-character comparison, ignoring locale comparison rules. Refer
to Section 11.8 for more information.")
On Wed, Jul 22, 2009 at 5:57 PM, Robert James<srobertjames@gmail.com> wrote:
Hi. I'm confused about the behavior of LIKE under utf8 locale.
Accoding to the docs (
http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted
below), it seems that LIKE ignores locale and hence can't use indexes. Yet,
EXPLAIN clearly shows it using indexes.
Are you sure you're using 8.2?
On Wed, 2009-07-22 at 12:57 -0400, Robert James wrote:
The docs suggest a workaround, to allow LIKE to use indexes - but I
couldn't figure it out. Although I'm stuck with locale utf8, all my
data is 7-bit ascii. I'm doing a tremendous amount of WHERE x LIKE
'abc%' - what's the best way to set up a good index?
Create the index using text_pattern_ops, and I think it will do what you
want.
CREATE INDEX foo_t_idx ON foo (t text_pattern_ops);
Regards,
Jeff Davis
Robert James <srobertjames@gmail.com> writes:
Hi. I'm confused about the behavior of LIKE under utf8 locale.
UTF8 is not a locale, it's an encoding. If you're using C locale then
LIKE can use indexes, regardless of the encoding. If you're using
some other locale then you need a pattern_ops index.
regards, tom lane
Thank you, Tom. I guess I'm a bit confused about things here. How can I
find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for
all text fields?
On Wed, Jul 22, 2009 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Robert James <srobertjames@gmail.com> writes:
Hi. I'm confused about the behavior of LIKE under utf8 locale.
UTF8 is not a locale, it's an encoding. If you're using C locale then
LIKE can use indexes, regardless of the encoding. If you're using
some other locale then you need a pattern_ops index.regards, tom lane
Robert James wrote:
Thank you, Tom. I guess I'm a bit confused about things here. How can
I find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8
for all text fields?
use psql:
postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
output shortend ;-)
Cheers
Andy
Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...
On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk <
a.wenk@netzmeister-st-pauli.de> wrote:
Show quoted text
Robert James wrote:
Thank you, Tom. I guess I'm a bit confused about things here. How can I
find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for
all text fields?use psql:
postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |output shortend ;-)
Cheers
Andy
Robert James wrote:
Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...
ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:
/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LC
This should output something like:
LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8
I hope this helps a little ...
Cheers
Andy
P.S.: top posting is ugly ;-)
Show quoted text
On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk
<a.wenk@netzmeister-st-pauli.de <mailto:a.wenk@netzmeister-st-pauli.de>>
wrote:Robert James wrote:
Thank you, Tom. I guess I'm a bit confused about things here.
How can I find the locale of my database? (I wasn't able to
find this in the docs).
If I do have the locale set to 'C', do I loose anything by using
utf8 for all text fields?use psql:
postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |output shortend ;-)
Cheers
Andy
Andreas Wenk wrote:
Robert James wrote:
Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LCThis should output something like:
LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8I hope this helps a little ...
Cheers
Andy
P.S.: top posting is ugly ;-)
more correct: \l+ is also available in 8.2 but the output in 8.4 is
extended ...
Cheers
Andy
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:
Robert James wrote:
Thanks - I don't show any locale:
ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:
/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LC
Easier way is "SHOW LC_COLLATE" and "SHOW LC_CTYPE" ...
regards, tom lane