sequential scans and the like operator

Started by Roderick A. Andersonover 24 years ago8 messagesgeneral
Jump to latest
#1Roderick A. Anderson
raanders@tincan.org

There is a discussion going on on the sql-ledger mailing list concerning
whether indexes will provide any performance improvements. The one that
caught my eye was whether using LIKE in a statement would force a
sequential scan.

I tried checking the PG list archives but fts.postgresql.org is a little
slow - as many know - plus a quick search using google didn't turn up
much.

My guesstimation is a leading or mid-field wildcard (_, %) would force a
sequential scan but a trailing wildcard would not (should not?).

I think (seem to remember) that this was discussed either here or on
the HACKERS list. A pointer to the appropriate message/thread would be
great.

TIA,
Rod
--
Let Accuracy Triumph Over Victory

Zetetic Institute
"David's Sling"
Marc Stiegler

#2Adam Haberlach
adam@newsnipple.com
In reply to: Roderick A. Anderson (#1)
Re: sequential scans and the like operator

On Tue, Jan 08, 2002 at 04:50:53AM -0800, Roderick A. Anderson wrote:

There is a discussion going on on the sql-ledger mailing list concerning
whether indexes will provide any performance improvements. The one that
caught my eye was whether using LIKE in a statement would force a
sequential scan.

I tried checking the PG list archives but fts.postgresql.org is a little
slow - as many know - plus a quick search using google didn't turn up
much.

My guesstimation is a leading or mid-field wildcard (_, %) would force a
sequential scan but a trailing wildcard would not (should not?).

I, myself, don't remember the thread exactly, but I'm pretty sure that
was the behavior that we were told to expect...

--
Adam Haberlach | Who buys an eight-processor machine and then
adam@newsnipple.com | watches 30 movies on it all at the same time?
http://newsnipple.com | Beats me. They told us they could sell it, so
| we made it. -- George Hoffman, Be Engineer

#3Roderick A. Anderson
raanders@tincan.org
In reply to: Adam Haberlach (#2)
Re: sequential scans and the like operator

On Tue, 8 Jan 2002, Dave Trombley wrote:

You can always check exaclty what's being done in your queries by
using the EXPLAIN command. For example, to test your hypothesis:

From your example I'm assuming I don't need data to test the optimizer.
Cool! This is why I asked (and a little laziness) instead of trying it.
No real data to populate the table with.

Thanks for the idea/knowledge.

Rod
--
Let Accuracy Triumph Over Victory

Zetetic Institute
"David's Sling"
Marc Stiegler

#4Dave Trombley
dtrom@bumba.net
In reply to: Roderick A. Anderson (#1)
Re: sequential scans and the like operator

Roderick A. Anderson wrote:

There is a discussion going on on the sql-ledger mailing list concerning
whether indexes will provide any performance improvements. The one that
caught my eye was whether using LIKE in a statement would force a
sequential scan.

You can always check exaclty what's being done in your queries by
using the EXPLAIN command. For example, to test your hypothesis:

------------------
test=# create table foo(f text, i int);
CREATE
test=# create index foo_t on foo (f);
CREATE
test=# explain select * from foo where f like '%uiop%';
NOTICE: QUERY PLAN:

Seq Scan on foo (cost=0.00..22.50 rows=1 width=36)

EXPLAIN
test=# explain select * from foo where f = '%uiop%';
NOTICE: QUERY PLAN:

Index Scan using foo_t on foo (cost=0.00..17.07 rows=5 width=36)

EXPLAIN
---------------------
-dj trombley
<dtrom@bumba.net>

#5Jeffrey W. Baker
jwbaker@acm.org
In reply to: Roderick A. Anderson (#3)
Re: sequential scans and the like operator

On Tue, 8 Jan 2002, Roderick A. Anderson wrote:

On Tue, 8 Jan 2002, Dave Trombley wrote:

You can always check exaclty what's being done in your queries by
using the EXPLAIN command. For example, to test your hypothesis:

From your example I'm assuming I don't need data to test the optimizer.

Cool! This is why I asked (and a little laziness) instead of trying it.
No real data to populate the table with.

Thanks for the idea/knowledge.

Ah, no that actually isn't true. For example, I believe that Pg will
always Seq Scan a table that has never been vacuum analyzed, or is empty,
or has only a few rows.

-jwb

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Trombley (#4)
Re: sequential scans and the like operator

Dave Trombley <dtrom@bumba.net> writes:

Roderick A. Anderson wrote:

There is a discussion going on on the sql-ledger mailing list concerning
whether indexes will provide any performance improvements. The one that
caught my eye was whether using LIKE in a statement would force a
sequential scan.

You can always check exaclty what's being done in your queries by
using the EXPLAIN command.

Alternatively, search the pgsql mailing list archives; LIKE performance
has been discussed more times than I care to think about. The present
state of play, I believe, is:

* LIKE and regexp match WHERE clauses are potentially indexable if the
pattern specifies a fixed prefix of one or more characters. The longer
the fixed prefix, the more selective the index condition (and hence the
greater the probability the planner will choose to use it).

As examples:

foo LIKE 'abc%bar' indexable (prefix is abc)
foo LIKE '_abc%bar' not indexable (first character not fixed)
foo ~ 'abc' not indexable (pattern not anchored left)
foo ~ '^abc' indexable (prefix is abc)
foo ILIKE 'abc%' not indexable (1st char could be A or a)

* If Postgres was compiled with LOCALE support and is using a non-C
locale, <blech>LIKE indexing is disabled</blech> because the sort order
of the index may not agree with what's needed to perform LIKE searches.

This last point is rather nasty since non-C locales are rapidly becoming
the usual default, even in the USA. Linuxen tend to default to en_US
locale, for example.

regards, tom lane

#7Andrew Sullivan
andrew@libertyrms.info
In reply to: Roderick A. Anderson (#3)
Re: sequential scans and the like operator

On Tue, Jan 08, 2002 at 10:45:25AM -0800, Roderick A. Anderson wrote:

On Tue, 8 Jan 2002, Dave Trombley wrote:

You can always check exaclty what's being done in your queries by
using the EXPLAIN command. For example, to test your hypothesis:

From your example I'm assuming I don't need data to test the optimizer.

Cool! This is why I asked (and a little laziness) instead of trying it.
No real data to populate the table with.

Uh, except that the results will be different depending on how much
data there is. If you only have three rows in the table, it's always
going to use a seqscan no matter what the indices are: it's cheaper
just to look at all the records than to read the index.

That's why VACUUM ANALYSE is so important, by the way.

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#8Roderick A. Anderson
raanders@tincan.org
In reply to: Tom Lane (#6)
Re: sequential scans and the like operator

On Tue, 8 Jan 2002, Tom Lane wrote:

Alternatively, search the pgsql mailing list archives;

Yeah this was where I was trying to get but it was _way_ too slow and I
have AisleRiot, Gnome Stones, and Demineur to keep my happy while I
wait.

* If Postgres was compiled with LOCALE support and is using a non-C
locale, <blech>LIKE indexing is disabled</blech> because the sort order
of the index may not agree with what's needed to perform LIKE searches.

This last point is rather nasty since non-C locales are rapidly becoming
the usual default, even in the USA. Linuxen tend to default to en_US
locale, for example.

Got bitten here once.

Thanks for the more indepth explanation.

Best,
Rod
--
Let Accuracy Triumph Over Victory

Zetetic Institute
"David's Sling"
Marc Stiegler