indices: ~* / text_ops

Started by Bernhard Lorenzover 27 years ago2 messageshackers
Jump to latest
#1Bernhard Lorenz
bernhard.lorenz@iconsult.at

hullo,

ive two serious problems with postgresql (latest version):

1.) i seem to be unable (well, i _am_ ;-) to create an index
on a text field and then have that index being used if
i perform a search using ~*, ~~, and related operators.
i found out that these operators can only successfully
be implemented if i use box* field types etc. this is a real
pain, since i have that database with more than 70,000
entries and it always does a sequential scan.
a query might look like
"select * from table where field ~* 'string'" or
"select * from table where field ~~ '%string%' etc.
there are indices (hash, btree, (field text_ops)), but
they wont be used.
can anybody of you possibly help me any further on this issue?

2.) another thing i noticed is that while "~*" is supposed to
perform a case insensitive search, it does not. i havent
checked too much into the bug behind it (like "all ascii
values > 127 ..."), but the typical austrian and german
characters like "ae" with two dots (forgot the terminologically
exact name, forgive me) etc. will not be searched
properly, thus, if i have a field entry with
'AEyadayada'
and perform a search with
... ~* 'aeyadayada',
it wont find anything, i have to use ~* 'AEyadayada'.
opinions on that one woul dbe appreciated as well!

best regards,

+bl.

--
Bernhard Lorenz Managing Partner
!C Internet Consult http://www.iconsult.at/
Pacassistrasse 32, A-1130 Wien +43/1/319 09 90
Aichholzgasse 6/5, A-1120 Wien +43/1/817 39 23

#2Bruce Momjian
bruce@momjian.us
In reply to: Bernhard Lorenz (#1)
Re: [HACKERS] indices: ~* / text_ops

[Charset ISO-8859-1 unsupported, filtering to ASCII...]

hullo,

ive two serious problems with postgresql (latest version):

1.) i seem to be unable (well, i _am_ ;-) to create an index
on a text field and then have that index being used if
i perform a search using ~*, ~~, and related operators.
i found out that these operators can only successfully
be implemented if i use box* field types etc. this is a real
pain, since i have that database with more than 70,000
entries and it always does a sequential scan.
a query might look like
"select * from table where field ~* 'string'" or
"select * from table where field ~~ '%string%' etc.
there are indices (hash, btree, (field text_ops)), but
they wont be used.
can anybody of you possibly help me any further on this issue?

Indexes on strings can only be used if the start of the string is
anchored with ^ because the index only sorts the strings starting with
the first character.

I have added this to the FAQ, which is on the web site.

2.) another thing i noticed is that while "~*" is supposed to
perform a case insensitive search, it does not. i havent
checked too much into the bug behind it (like "all ascii
values > 127 ..."), but the typical austrian and german
characters like "ae" with two dots (forgot the terminologically
exact name, forgive me) etc. will not be searched
properly, thus, if i have a field entry with
'AEyadayada'
and perform a search with
... ~* 'aeyadayada',
it wont find anything, i have to use ~* 'AEyadayada'.
opinions on that one woul dbe appreciated as well!

Not sure how we handle this. I don't think we handle this, but we
should, but it is hard.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026