Indexed leading substring searches - worked, now doesn't

Started by Wesabout 21 years ago4 messagesgeneral
Jump to latest
#1Wes
wespvp@syntegra.com

I know my leading substring searches used to be done via indexes. We
specifically tested that. Since the last time I tested it, the database has
probably been reloaded to fix a corruption problem. Now all I can get is
sequential leading substring searches. In the examples below, the database
was vacuumed last night.

The database is very large (currently about 100 GB, and will be 100's of
gigabytes), so performance is important. This particular table in the
example has only about 2.8 million rows.

PostgreSQL version is 7.4.5. Prior to the reload, it was 7.4.1.

The locale is showing up as en_US.iso885915. As far as I know, it was
always this (default RedHat install), so I don't understand why it worked
before. Did something change between 7.4.1 and 7.4.5? I supposed it's
possible that I specified locale=C on the original database and don't
remember that...

I'm not going to have to "initdb --locale=C" and am I? I looked at index
classes, and that doesn't appear to be something I want to do, due to
performance. What kind of performance hit do you actually take by using an
index class?

Wes

Pg_controldata shows:

Maximum length of locale name: 128
LC_COLLATE: en_US.iso885915
LC_CTYPE: en_US.iso885915

narc=> \d addresses
Table "public.addresses"
Column | Type | Modifiers
-------------+------------------------+-----------
address_key | numeric(12,0) | not null
address | character varying(255) | not null
Indexes:
"addresses_pkey" primary key, btree (address_key)
"addresses_i_address" btree (address)

narc=> select count(*) from addresses;
count
---------
2829640
(1 row)

narc=> explain select * from addresses where address = 'blah';
QUERY PLAN
----------------------------------------------------------------------------
----------
Index Scan using addresses_i_address on addresses (cost=0.00..2.81 rows=1
width=40)
Index Cond: ((address)::text = 'blah'::text)
(2 rows)

narc=> explain select * from addresses where address like 'blah%';
QUERY PLAN
--------------------------------------------------------------
Seq Scan on addresses (cost=0.00..61244.68 rows=2 width=40)
Filter: ((address)::text ~~ 'blah%'::text)
(2 rows)

narc=> explain analyze select * from addresses where address like 'blah%';
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------
Seq Scan on addresses (cost=0.00..61244.68 rows=2 width=40) (actual
time=1445.386..8913.435 rows=6 loops=1)
Filter: ((address)::text ~~ 'blah%'::text)
Total runtime: 8913.504 ms
(3 rows)

Something else doesn't make sense.. I did the same query a few minutes ago
and the cost was totally different:

narc=> explain select * from addresses where address like 'blah%';
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on addresses (cost=100000000.00..100061244.67 rows=2 width=40)
Filter: ((address)::text ~~ 'blah%'::text)
(2 rows)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#1)
Re: Indexed leading substring searches - worked, now doesn't

Wes <wespvp@syntegra.com> writes:

The locale is showing up as en_US.iso885915. As far as I know, it was
always this (default RedHat install), so I don't understand why it worked
before. Did something change between 7.4.1 and 7.4.5? I supposed it's
possible that I specified locale=C on the original database and don't
remember that...

You got it in one ...

I'm not going to have to "initdb --locale=C" and am I?

You could use the alternative operator classes for pattern searches, but
if you want the same indexes to also serve for normal text sorting,
C locale is the better bet.

regards, tom lane

#3Wes
wespvp@syntegra.com
In reply to: Tom Lane (#2)
Re: Indexed leading substring searches - worked, now

On 2/3/05 2:29 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

You could use the alternative operator classes for pattern searches, but
if you want the same indexes to also serve for normal text sorting,
C locale is the better bet.

As an interim solution until I can reload the database (takes a weekend),
can I define two indexes on the same field, one using operator classes and
one not (and have them automatically used as appropriate)?

Because of the time involved, I'm trying to hold off on another reload until
we upgrade to 8.x.

Wes

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#3)
Re: Indexed leading substring searches - worked, now doesn't

Wes <wespvp@syntegra.com> writes:

As an interim solution until I can reload the database (takes a weekend),
can I define two indexes on the same field, one using operator classes and
one not (and have them automatically used as appropriate)?

Certainly.

regards, tom lane