Can LIKE under utf8 use INDEXes?

Started by Robert Jamesover 16 years ago10 messagesgeneral
Jump to latest
#1Robert James
srobertjames@gmail.com

Hi. I'm confused about the behavior of LIKE under utf8 locale.
Accoding to the docs (
http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted
below), it seems that LIKE ignores locale and hence can't use indexes. Yet,
EXPLAIN clearly shows it using indexes.
The docs suggest a workaround, to allow LIKE to use indexes - but I couldn't
figure it out. Although I'm stuck with locale utf8, all my data is 7-bit
ascii. I'm doing a tremendous amount of WHERE x LIKE 'abc%' - what's the
best way to set up a good index?
(I can change the settings for this database - but the cluster must remain
utf8).
Thanks!
(Here is the doc excerpt, from
http://www.postgresql.org/docs/8.2/interactive/locale.html :
"The drawback of using locales other than C or POSIX in PostgreSQL is its
performance impact. It slows character handling and prevents ordinary
indexes from being used by LIKE. For this reason use locales only if you
actually need them. As a workaround to allow PostgreSQL to use indexes with
LIKE clauses under a non-C locale, several custom operator classes exist.
These allow the creation of an index that performs a strict
character-by-character comparison, ignoring locale comparison rules. Refer
to Section 11.8 for more information.")

#2Bruce Momjian
bruce@momjian.us
In reply to: Robert James (#1)
Re: Can LIKE under utf8 use INDEXes?

On Wed, Jul 22, 2009 at 5:57 PM, Robert James<srobertjames@gmail.com> wrote:

Hi.  I'm confused about the behavior of LIKE under utf8 locale.
Accoding to the docs (
http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted
below), it seems that LIKE ignores locale and hence can't use indexes.  Yet,
EXPLAIN clearly shows it using indexes.

Are you sure you're using 8.2?

--
greg
http://mit.edu/~gsstark/resume.pdf

#3Jeff Davis
pgsql@j-davis.com
In reply to: Robert James (#1)
Re: Can LIKE under utf8 use INDEXes?

On Wed, 2009-07-22 at 12:57 -0400, Robert James wrote:

The docs suggest a workaround, to allow LIKE to use indexes - but I
couldn't figure it out. Although I'm stuck with locale utf8, all my
data is 7-bit ascii. I'm doing a tremendous amount of WHERE x LIKE
'abc%' - what's the best way to set up a good index?

Create the index using text_pattern_ops, and I think it will do what you
want.

CREATE INDEX foo_t_idx ON foo (t text_pattern_ops);

Regards,
Jeff Davis

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert James (#1)
Re: Can LIKE under utf8 use INDEXes?

Robert James <srobertjames@gmail.com> writes:

Hi. I'm confused about the behavior of LIKE under utf8 locale.

UTF8 is not a locale, it's an encoding. If you're using C locale then
LIKE can use indexes, regardless of the encoding. If you're using
some other locale then you need a pattern_ops index.

regards, tom lane

#5Robert James
srobertjames@gmail.com
In reply to: Tom Lane (#4)
Re: Can LIKE under utf8 use INDEXes?

Thank you, Tom. I guess I'm a bit confused about things here. How can I
find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for
all text fields?

On Wed, Jul 22, 2009 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Robert James <srobertjames@gmail.com> writes:

Hi. I'm confused about the behavior of LIKE under utf8 locale.

UTF8 is not a locale, it's an encoding. If you're using C locale then
LIKE can use indexes, regardless of the encoding. If you're using
some other locale then you need a pattern_ops index.

regards, tom lane

#6Andreas Wenk
a.wenk@netzmeister-st-pauli.de
In reply to: Robert James (#5)
Re: Can LIKE under utf8 use INDEXes?

Robert James wrote:

Thank you, Tom. I guess I'm a bit confused about things here. How can
I find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8
for all text fields?

use psql:

postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

output shortend ;-)

Cheers

Andy

#7Robert James
srobertjames@gmail.com
In reply to: Andreas Wenk (#6)
Re: Can LIKE under utf8 use INDEXes?

Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...

On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk <
a.wenk@netzmeister-st-pauli.de> wrote:

Show quoted text

Robert James wrote:

Thank you, Tom. I guess I'm a bit confused about things here. How can I
find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for
all text fields?

use psql:

postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

output shortend ;-)

Cheers

Andy

#8Andreas Wenk
a.wenk@netzmeister-st-pauli.de
In reply to: Robert James (#7)
Re: Can LIKE under utf8 use INDEXes?

Robert James wrote:

Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...

ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:

/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LC

This should output something like:

LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8

I hope this helps a little ...

Cheers

Andy

P.S.: top posting is ugly ;-)

Show quoted text

On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk
<a.wenk@netzmeister-st-pauli.de <mailto:a.wenk@netzmeister-st-pauli.de>>
wrote:

Robert James wrote:

Thank you, Tom. I guess I'm a bit confused about things here.
How can I find the locale of my database? (I wasn't able to
find this in the docs).
If I do have the locale set to 'C', do I loose anything by using
utf8 for all text fields?

use psql:

postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

output shortend ;-)

Cheers

Andy

#9Andreas Wenk
a.wenk@netzmeister-st-pauli.de
In reply to: Andreas Wenk (#8)
Re: Can LIKE under utf8 use INDEXes?

Andreas Wenk wrote:

Robert James wrote:

Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...

ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:

/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LC

This should output something like:

LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8

I hope this helps a little ...

Cheers

Andy

P.S.: top posting is ugly ;-)

more correct: \l+ is also available in 8.2 but the output in 8.4 is
extended ...

Cheers

Andy

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Wenk (#8)
Re: Can LIKE under utf8 use INDEXes?

Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:

Robert James wrote:

Thanks - I don't show any locale:

ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:

/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LC

Easier way is "SHOW LC_COLLATE" and "SHOW LC_CTYPE" ...

regards, tom lane