~* OR LIKE?

Started by Matt Friedmanover 25 years ago2 messagesgeneral

matt@daart.ca

over 25 years ago

I have a query that I could run a number of ways:
Like this:

index_word.word ~*'radio.*'
OR
index_word.word ~*'spry.*'

Or I could use LIKE and the "%" comparison

If I toss out the need for comparisions I could just use: IN
('radio','spry')

What's the relative performance for: "IN" "LIKE" or "~*"? I'm trying to
decide if the performance loss of doing extra string compares is worth
getting slighly more relevant results.

Matt Friedman

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Matt Friedman (#1)

Re: ~* OR LIKE?

"Matt Friedman" <matt@daart.ca> writes:

What's the relative performance for: "IN" "LIKE" or "~*"?

If you don't have an index on the target column, or if the pattern is
not constant (eg you took it from another table), then these will all
reduce to sequential-scan-and-examine-every-tuple. Simple equality
comparisons will take a little less CPU time than pattern matches, but
the tuple retrieval costs are going to dominate everything anyway.
In short: it hardly matters unless you can use an indexscan.

IN ('foo','bar','baz') reduces to x = 'foo' OR x = 'bar' OR x = 'baz'.
This can be done by a series of indexscans (one index probe per OR
clause), so it's pretty quick for small numbers of alternatives.

Case-sensitive LIKE and ~ can use indexscans if (at least part of) the
constant pattern is left-anchored. For example, x LIKE 'foo%bar' can
only match values beginning with 'foo', so an index scan over the range
of such values can be used. In regexp notation this'd be a pattern
anchored left with ^.

The above breaks down for case-insensitive matching, and it also breaks
down in non-C locales, where string sort ordering may not match the
semantics of pattern prefixes closely enough. So in those cases you are
back to sequential scan.

regards, tom lane