Using index for "like 'ABC%'" type query

Started by lecalmost 23 years ago40 messageshackersgeneral
Jump to latest
#1lec
englim@pc.jaring.my
hackersgeneral

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.

#2Jeffrey Melloy
jmelloy@visualdistortion.org
In reply to: lec (#1)
hackersgeneral
Re: Using index for "like 'ABC%'" type query

There is a module called "tsearch" in the contrib directory which works
quite well.
On Thursday, April 17, 2003, at 09:56 PM, lec wrote:

Show quoted text

For current version of postgresql, is it possible for a query like
below to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#3Bruce Momjian
bruce@momjian.us
In reply to: Jeffrey Melloy (#2)
hackersgeneral
Re: Using index for "like 'ABC%'" type query

See the FAQ about index usage.

---------------------------------------------------------------------------

Jeffrey Melloy wrote:

There is a module called "tsearch" in the contrib directory which works
quite well.
On Thursday, April 17, 2003, at 09:56 PM, lec wrote:

For current version of postgresql, is it possible for a query like
below to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Andrew Sullivan
andrew@libertyrms.info
In reply to: lec (#1)
hackersgeneral
Re: Using index for "like 'ABC%'" type query

On Fri, Apr 18, 2003 at 10:56:52AM +0800, lec wrote:

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supportedplatforms), it won't work. Just make sure that your locale
is C when you do initdb.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Sullivan (#4)
hackersgeneral
default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
hackersgeneral
Re: [HACKERS] default locale considered harmful? (was Re:

Tom Lane wrote:

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

Agreed, or we could special-case en_US to be C locale.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Dennis Gearon
gearond@cvc.net
In reply to: Tom Lane (#5)
hackersgeneral
Re: default locale considered harmful? (was Re: Using

I whole heartedly agree. It would make it easier to do multilangugages in one db, I think.

When I get done with my current project, about a year from now, I'm going to dig in deep to UTF-8 on postgress and see if I cn write a collation function in C to be used in ORDER BY, statements.

Tom Lane wrote:

Show quoted text

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#8Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Tom Lane (#5)
hackersgeneral
Re: default locale considered harmful? (was Re:

Yes. I'd prefer C locale by default.

Usage of other locales seems to incur many issues. Those who know about
locale and want it are usually prepared to turn it on and use it given some
documentation. Those who don't should get C locale.

Link.

At 10:48 AM 4/18/2003 -0400, Tom Lane wrote:

Show quoted text

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#9Darko Prenosil
darko.prenosil@finteh.hr
In reply to: lec (#1)
hackersgeneral
Re: Using index for "like 'ABC%'" type query

On Friday 18 April 2003 02:56, lec wrote:

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

I think some other DB projects use something that is called
"BEGINS WITH" operator. Some time ago I tried to write such operator for
postgres. Unfortunatelly at that time I had no experience with postgres
internals, so the functions are written in PL/PSQL !!! They are working all
right, but the whole thing should be rewriten in C. If You are interested,
I'll send You the "code". For us using non "C" collations, I do not know for
other way. Of course if "C" collation satisfy Your needs, You do not need to
do such "hacks".

Regards !

#10Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#5)
hackersgeneral
Re: default locale considered harmful?

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

I agree with that initdb should by default initialize the database in
C locale. I have found on a local list too many users in Japan are
suffered by the locale problem and I'm getting tired of saying "you
should not forget to explicitly specify --no-locale siwtch".
--
Tatsuo Ishii

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#5)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane writes:

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

If you're concerned about speed, let's think about fixing the real
problems, not about disabling the feature altogether. A while ago I
proposed an easy solution that made LIKE use an index based on strxfrm
order instead. It was rejected on the grounds that it would prevent a
future enhancement of the LIKE mechanism to use the locale-enabled
collation order, but no one seems to be seriously interested in
implementing that. I still have the patch; we can reconsider it if you
like.

(Btw., LIKE using the locale-enabled collation sequence is hardly going to
work, because most locales compare strings backwards from the end to the
start in the second pass, so something like LIKE 'foo%' can easily give
inconsistent results, since you don't know what the end of the string
really is. It's better to think of pattern matching as
character-by-character matching.)

--
Peter Eisentraut peter_e@gmx.net

#12Andrew Sullivan
andrew@libertyrms.info
In reply to: Tom Lane (#5)
hackersgeneral
Re: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)

On Fri, Apr 18, 2003 at 10:48:21AM -0400, Tom Lane wrote:

prevalence of en_US as a platform locale setting. Comments?

I tend to agree, especially since there's no real fix after you've
initdb'd.

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110
#13Mike Castle
dalgoda@ix.netcom.com
In reply to: Andrew Sullivan (#4)
hackersgeneral
Re: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)

In article <3954.1050677301@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

And who's brilliant idea was that anyway? Grrr. 'file' should NOT sort
next to 'FILE'.

mrc [more than slightly annoyed and off topic]
--
Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Mike Castle (#13)
hackersgeneral
Re: default locale considered harmful? (was Re: Using

Mike Castle writes:

And who's brilliant idea was that anyway? Grrr. 'file' should NOT sort
next to 'FILE'.

Any reputable dictionary in the world would disagree with you.

--
Peter Eisentraut peter_e@gmx.net

#15Olleg Samojlov
olleg@telecom.mipt.ru
In reply to: Tom Lane (#5)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL] Using

Tom Lane wrote:

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

Americans will be agree with you. :-)

IMHO:

LC_CTYPE -\
LC_COLLATE - Ideally make client depending for international databases.
But it seems impossible to use indexes. May be add parameter to CREATE
INDEX, which locale (locales) use for sorting text data?

LC_NUMERIC client depending
LC_TIME client depending, default ISO
LC_MONETARY not usefull, lacking monetary type
LC_MESSAGES client depending for notice (client messages), server
depending for console or syslog messages, agree with default C

Summary: Exchange information with local services (stdout, syslog,
files, etc) must be according local locale. With client - according
client locale.

--
Olleg

#16Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#11)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL]

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

If fixed, how was it done?

---------------------------------------------------------------------------

Peter Eisentraut wrote:

Tom Lane writes:

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

If you're concerned about speed, let's think about fixing the real
problems, not about disabling the feature altogether. A while ago I
proposed an easy solution that made LIKE use an index based on strxfrm
order instead. It was rejected on the grounds that it would prevent a
future enhancement of the LIKE mechanism to use the locale-enabled
collation order, but no one seems to be seriously interested in
implementing that. I still have the patch; we can reconsider it if you
like.

(Btw., LIKE using the locale-enabled collation sequence is hardly going to
work, because most locales compare strings backwards from the end to the
start in the second pass, so something like LIKE 'foo%' can easily give
inconsistent results, since you don't know what the end of the string
really is. It's better to think of pattern matching as
character-by-character matching.)

--
Peter Eisentraut peter_e@gmx.net

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL]

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

I'm still of the opinion that we should make C the default locale.
But I'm not sure where the consensus is, so I've not made the change.

If fixed, how was it done?

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

regards, tom lane

#18Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#17)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane writes:

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

I don't understand why you call this a hack. Pattern matching and string
comparison simply work differently, so the proper solution is to use
different operator classes. After all, that's what operator classes exist
for. What is left to be desired?

--
Peter Eisentraut peter_e@gmx.net

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#18)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL]

Peter Eisentraut <peter_e@gmx.net> writes:

I don't understand why you call this a hack. Pattern matching and string
comparison simply work differently, so the proper solution is to use
different operator classes. After all, that's what operator classes exist
for. What is left to be desired?

I think that a more general solution would be the ability to select a
locale (and hence a sort order) per-column, as the SQL spec envisions.
Then you'd just select C locale for columns you wanted to do pattern
matching for.

Admittedly, you'd still need the opclass-based approach for cases where
you wanted both pattern matching and a non-C-locale sort order ... but
I doubt that constitutes the majority of cases.

I guess my main concern is that we should not feel that this approach
takes the heat off us to support multiple locales. As a solution to the
narrow problem of LIKE performance, it's okay --- but it's not getting
us any nearer to a solution to the general locale problem.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#17)
hackersgeneral
Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

I'm still of the opinion that we should make C the default locale.
But I'm not sure where the consensus is, so I've not made the change.

If fixed, how was it done?

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

So, my understanding is that you would create something such as:

CREATE INDEX iix ON tab (LIKE col)

and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
can't be used for >= or ORDER BY, but it can be used for equality tests?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
hackersgeneral
#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
hackersgeneral
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#19)
hackersgeneral
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#21)
hackersgeneral
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#22)
hackersgeneral
#26Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#25)
hackersgeneral
#27Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#26)
hackers
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
hackers
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
hackers
#30Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#28)
hackers
#31Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Tom Lane (#28)
hackers
#32scott.marlowe
scott.marlowe@ihs.com
In reply to: Nigel J. Andrews (#31)
hackers
#33Peter Eisentraut
peter_e@gmx.net
In reply to: scott.marlowe (#32)
hackers
#34scott.marlowe
scott.marlowe@ihs.com
In reply to: Peter Eisentraut (#33)
hackers
#35Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: scott.marlowe (#32)
hackers
#36scott.marlowe
scott.marlowe@ihs.com
In reply to: Alvaro Herrera (#35)
hackers
#37Peter Eisentraut
peter_e@gmx.net
In reply to: scott.marlowe (#34)
hackers
#38scott.marlowe
scott.marlowe@ihs.com
In reply to: Peter Eisentraut (#37)
hackers
#39scott.marlowe
scott.marlowe@ihs.com
In reply to: scott.marlowe (#38)
hackers
#40Bruce Momjian
bruce@momjian.us
In reply to: scott.marlowe (#39)
hackers