citext like query and index usage

Started by Tore Halvorsenover 16 years ago5 messagesgeneral

tore.halvorsen@gmail.com

over 16 years ago

Hi,

Is it possible to use an index for like queries on a citext column?
I'm using pg 8.4.1 on windows - with no changes to the default configuration.

For example:

CREATE TABLE test ( citext citext NOT NULL );
INSERT INTO test select md5(random()::text) FROM generate_series(0, 1000000, 1);
CREATE INDEX test_citext_idx ON test USING btree(citext);
vacuum analyze test;

explain analyze select * from test where citext like '5555%'
...
Seq Scan on test (cost=0.00..20834.03 rows=5000 width=33)
(actual time=45.916..3691.540 rows=16 loops=1)
Filter: (citext ~~ '5555%'::citext)
Total runtime: 3691.676 ms

set enable_seqscan = off;
explain analyze select * from test where citext like '5555%'
...
Seq Scan on test (cost=10000000000.00..10000020834.03 rows=5000 width=33)
(actual time=45.578..3761.687 rows=16 loops=1)
Filter: (citext ~~ '5555%'::citext)
Total runtime: 3761.860 ms

With equal I'm getting an index scan
explain analyze select * from test where citext =
'55559cb65689f035766eb69ed615afd4'
Index Scan using test_citext_idx on test (cost=0.00..8.56 rows=1 width=33)
(actual time=0.452..0.462 rows=1 loops=1)
Index Cond: (citext = '55559cb65689f035766eb69ed615afd4'::citext)
Total runtime: 0.558 ms

So, is there any way to get the like queries to use the index?

--
Eld på åren og sol på eng gjer mannen fegen og fjåg. [Jøtul]
<demo> 2009 Tore Halvorsen || +052 0553034554

Emanuel Calvo Franco

postgres.arg@gmail.com

over 16 years ago

In reply to: Tore Halvorsen (#1)

Re: citext like query and index usage

Is it possible to use an index for like queries on a citext column?
I'm using pg 8.4.1 on windows - with no changes to the default configuration.

For example:

CREATE TABLE test ( citext citext NOT NULL );
INSERT INTO test select md5(random()::text) FROM generate_series(0, 1000000, 1);
CREATE INDEX test_citext_idx ON test USING btree(citext);
vacuum analyze test;

explain analyze select * from test where citext like '5555%'
...
Seq Scan on test (cost=0.00..20834.03 rows=5000 width=33)
(actual time=45.916..3691.540 rows=16 loops=1)
Filter: (citext ~~ '5555%'::citext)
Total runtime: 3691.676 ms

set enable_seqscan = off;
explain analyze select * from test where citext like '5555%'
...
Seq Scan on test (cost=10000000000.00..10000020834.03 rows=5000 width=33)
(actual time=45.578..3761.687 rows=16 loops=1)
Filter: (citext ~~ '5555%'::citext)
Total runtime: 3761.860 ms

With equal I'm getting an index scan
explain analyze select * from test where citext =
'55559cb65689f035766eb69ed615afd4'
Index Scan using test_citext_idx on test (cost=0.00..8.56 rows=1 width=33)
(actual time=0.452..0.462 rows=1 loops=1)
Index Cond: (citext = '55559cb65689f035766eb69ed615afd4'::citext)
Total runtime: 0.558 ms

So, is there any way to get the like queries to use the index?

I don't know if it is a good practice (in this case), but you can
create an index per value
(expressional indexes).

CREATE INDEX xx ON table (citext_column ) WHERE citext_column ~~ '5555%';

But IMHO if you are interest only in the firsts values (example 4)
you can create an index using hash_text function:

CREATE INDEX xx ON table (hashtext(substring(citext_col,1,4)));
--disable seqscan for little tables
explain select * from pp where (hashtext(substring(i,1,4))) = hashtext('0.06');

Without hashtext:
CREATE INDEX xx ON table (substring(citext_col,1,4));
explain select * from pp where substring(i,1,4) = '0.06';

The entire field to search:
CREATE INDEX xx ON table (hashtext(citext_col));
explain select * from pp where hashtext(i) = hashtext('all the field here');

It is useful?

--
Emanuel Calvo Franco
DBA at: www.siu.edu.ar
www.emanuelcalvofranco.com.ar

Alvaro Herrera

alvherre@2ndquadrant.com

over 16 years ago

In reply to: Tore Halvorsen (#1)

Re: citext like query and index usage

Tore Halvorsen escribiï¿½:

Hi,

Is it possible to use an index for like queries on a citext column?
I'm using pg 8.4.1 on windows - with no changes to the default configuration.

For example:

CREATE TABLE test ( citext citext NOT NULL );
INSERT INTO test select md5(random()::text) FROM generate_series(0, 1000000, 1);
CREATE INDEX test_citext_idx ON test USING btree(citext);

Hmm, I think this needs one of the *_pattern_ops indexes. I'm not sure
if you can use the builtin ones with citext though.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Tore Halvorsen

tore.halvorsen@gmail.com

over 16 years ago

In reply to: Alvaro Herrera (#3)

Re: citext like query and index usage

On Tue, Sep 22, 2009 at 8:49 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Tore Halvorsen escribió:

Hi,

Is it possible to use an index for like queries on a citext column?
I'm using pg 8.4.1 on windows - with no changes to the default configuration.

For example:

CREATE TABLE test ( citext citext NOT NULL );
INSERT INTO test select md5(random()::text) FROM generate_series(0, 1000000, 1);
CREATE INDEX test_citext_idx ON test USING btree(citext);

Hmm, I think this needs one of the *_pattern_ops indexes. I'm not sure
if you can use the builtin ones with citext though.

Yeah, I started looking at something like that - sadly after I sent my mail.

Looks like I'm stuck with lower(text_col) for now...

Thanks anyway :)

--
Eld på åren og sol på eng gjer mannen fegen og fjåg. [Jøtul]
<demo> 2009 Tore Halvorsen || +052 0553034554

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Tore Halvorsen (#4)

Re: citext like query and index usage

Tore Halvorsen <tore.halvorsen@gmail.com> writes:

On Tue, Sep 22, 2009 at 8:49 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Tore Halvorsen escribi�:

Is it possible to use an index for like queries on a citext column?

Hmm, I think this needs one of the *_pattern_ops indexes. �I'm not sure
if you can use the builtin ones with citext though.

Looks like I'm stuck with lower(text_col) for now...

Afraid so --- there are special cases in the planner for LIKE, and that
code only knows about the built-in types. Sometime we should figure out
how to push that logic out to datatype-specific code so it can be
more extensible.

regards, tom lane