Regex query not using index

Started by Postgres Userabout 18 years ago18 messagesgeneral
Jump to latest
#1Postgres User
postgres.developer@gmail.com

I'm running a simple query on 8.2. With this syntax, Explain indicate
that the index is scanned:
select * from eod where name = 'AA'

However, when I change the query to use simple regex:
select * from eod where name ~ 'AA'

now Explain indicates a seq scan:
Index Scan using equity_eod_symbol_idx on equity_eod (cost=0.00..8.27
rows=1 width=149)
Index Cond: ((symbol)::text = 'AA'::text)

Is there any way to 'encourage' Postgres to hit the index when using
regex? Do I need to create a functional index or something?
Without the index in play, I really can't use regex on any of my larger tables.

#2Erik Jones
erik@myemma.com
In reply to: Postgres User (#1)
Re: Regex query not using index

On Feb 19, 2008, at 9:32 PM, Postgres User wrote:

I'm running a simple query on 8.2. With this syntax, Explain indicate
that the index is scanned:
select * from eod where name = 'AA'

However, when I change the query to use simple regex:
select * from eod where name ~ 'AA'

now Explain indicates a seq scan:
Index Scan using equity_eod_symbol_idx on equity_eod (cost=0.00..8.27
rows=1 width=149)
Index Cond: ((symbol)::text = 'AA'::text)

Is there any way to 'encourage' Postgres to hit the index when using
regex? Do I need to create a functional index or something?
Without the index in play, I really can't use regex on any of my
larger tables.

You need it to be anchored:

select * from eod where name ~ '^AA';

If you're looking to be able to use indexes for searches within a
string then, for 8.2, you'll need to check out tsearch2.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#3Postgres User
postgres.developer@gmail.com
In reply to: Erik Jones (#2)
Re: Regex query not using index

Thanks, my dumb mistake.
I need to perform the equivalent of a WHERE clause OR expression using
regex to match exact strings.

_________________________________________________________________________

this example hits the index:
select * from eod where name ~ '^BA$'

but when I try to add another possible value to the regex, it does a row scan:
select * from eod where name ~ ^BA$|^AA$'

both of these statements return the right results, but the 2nd ignores
the index even though both values are left-anchored.

any workaround- this behavior doesn't seem to make sense

Show quoted text

On Feb 19, 2008 8:45 PM, Erik Jones <erik@myemma.com> wrote:

On Feb 19, 2008, at 9:32 PM, Postgres User wrote:

I'm running a simple query on 8.2. With this syntax, Explain indicate
that the index is scanned:
select * from eod where name = 'AA'

However, when I change the query to use simple regex:
select * from eod where name ~ 'AA'

now Explain indicates a seq scan:
Index Scan using equity_eod_symbol_idx on equity_eod (cost=0.00..8.27
rows=1 width=149)
Index Cond: ((symbol)::text = 'AA'::text)

Is there any way to 'encourage' Postgres to hit the index when using
regex? Do I need to create a functional index or something?
Without the index in play, I really can't use regex on any of my
larger tables.

You need it to be anchored:

select * from eod where name ~ '^AA';

If you're looking to be able to use indexes for searches within a
string then, for 8.2, you'll need to check out tsearch2.

Erik Jones

DBA | Emma(R)
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#4Chris
dmagick@gmail.com
In reply to: Postgres User (#3)
Re: Regex query not using index

Postgres User wrote:

Thanks, my dumb mistake.
I need to perform the equivalent of a WHERE clause OR expression using
regex to match exact strings.

_________________________________________________________________________

this example hits the index:
select * from eod where name ~ '^BA$'

but when I try to add another possible value to the regex, it does a row scan:
select * from eod where name ~ ^BA$|^AA$'

both of these statements return the right results, but the 2nd ignores
the index even though both values are left-anchored.

any workaround- this behavior doesn't seem to make sense

try changing it to

select * from eod where (name ~ '^BA$' or name ~ '^AA$')

though in this example they should both be name = 'XX' rather than regex'es.

--
Postgresql & php tutorials
http://www.designmagick.com/

#5Postgres User
postgres.developer@gmail.com
In reply to: Chris (#4)
Re: Regex query not using index

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Show quoted text

On Feb 19, 2008 9:16 PM, Chris <dmagick@gmail.com> wrote:

Postgres User wrote:

Thanks, my dumb mistake.
I need to perform the equivalent of a WHERE clause OR expression using
regex to match exact strings.

_________________________________________________________________________

this example hits the index:
select * from eod where name ~ '^BA$'

but when I try to add another possible value to the regex, it does a row scan:
select * from eod where name ~ ^BA$|^AA$'

both of these statements return the right results, but the 2nd ignores
the index even though both values are left-anchored.

any workaround- this behavior doesn't seem to make sense

try changing it to

select * from eod where (name ~ '^BA$' or name ~ '^AA$')

though in this example they should both be name = 'XX' rather than regex'es.

--
Postgresql & php tutorials
http://www.designmagick.com/

#6Chris
dmagick@gmail.com
In reply to: Postgres User (#5)
Re: Regex query not using index

Postgres User wrote:

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Why do you want it done this way?

You can build an array of strings to check and use an in clause.

Using php :

$checks = array('AA', 'BA');

$query = "select * from table where name in ('" . implode("','",
$checks) . "')";

and it should use an index (up to a point anyway).

--
Postgresql & php tutorials
http://www.designmagick.com/

#7Postgres User
postgres.developer@gmail.com
In reply to: Chris (#6)
Re: Regex query not using index

im trying to allow the client to pass a varchar param into my
function, and want to avoid any parsing of the parameter inside the
function, or code to build a sql string.

if the function can use this code, it will be compiled and optimized
(unlike a dynamic sql stirng)

select * from mytable where fielda ~ p_param

(where p_param is the input parameter)

Show quoted text

On Feb 19, 2008 9:34 PM, Chris <dmagick@gmail.com> wrote:

Postgres User wrote:

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Why do you want it done this way?

You can build an array of strings to check and use an in clause.

Using php :

$checks = array('AA', 'BA');

$query = "select * from table where name in ('" . implode("','",
$checks) . "')";

and it should use an index (up to a point anyway).

--

Postgresql & php tutorials
http://www.designmagick.com/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Postgres User (#5)
Re: Regex query not using index

"Postgres User" <postgres.developer@gmail.com> writes:

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Unfortunately, Postgres is not as intelligent as you are. There is
no mechanism to rewrite a multi-branch regex condition into multiple
indexscans. I recommend going back to the OR's.

regards, tom lane

#9Postgres User
postgres.developer@gmail.com
In reply to: Tom Lane (#8)
Re: Regex query not using index

doh! tom, let me know if you decide to hack out a fix for this one of
these nights ;)
thanks for your help.

Show quoted text

On Feb 19, 2008 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Postgres User" <postgres.developer@gmail.com> writes:

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Unfortunately, Postgres is not as intelligent as you are. There is
no mechanism to rewrite a multi-branch regex condition into multiple
indexscans. I recommend going back to the OR's.

regards, tom lane

#10Postgres User
postgres.developer@gmail.com
In reply to: Chris (#6)
Re: Regex query not using index

by the way, your example works fine unless it's a null value or empty string
unfortunately, postgres isn't smart enough to know that the when
p_param below is null, that the WHERE condition can be ignored

select * from table where name in (Coalesce(p_param, name))

which is the same as: select * from table where name in (name)

postgres does a row scan on the above sql. too slow.

Show quoted text

On Feb 19, 2008 9:34 PM, Chris <dmagick@gmail.com> wrote:

Postgres User wrote:

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Why do you want it done this way?

You can build an array of strings to check and use an in clause.

Using php :

$checks = array('AA', 'BA');

$query = "select * from table where name in ('" . implode("','",
$checks) . "')";

and it should use an index (up to a point anyway).

--

Postgresql & php tutorials
http://www.designmagick.com/

#11Chris
dmagick@gmail.com
In reply to: Postgres User (#10)
Re: Regex query not using index

Postgres User wrote:

by the way, your example works fine unless it's a null value or empty string
unfortunately, postgres isn't smart enough to know that the when
p_param below is null, that the WHERE condition can be ignored

select * from table where name in (Coalesce(p_param, name))

which is the same as: select * from table where name in (name)

postgres does a row scan on the above sql. too slow.

If there's no where condition, postgres has to do a seq-scan anyway so
your argument is void :)

--
Postgresql & php tutorials
http://www.designmagick.com/

#12Postgres User
postgres.developer@gmail.com
In reply to: Tom Lane (#8)
Re: Regex query not using index

a final question: why does this syntax do a seq scan + filter:

select * from tablea where fielda = fielda -or- select * from
tablea where fielda in (fielda)

while this syntax results in no filter, seq scan only

select * from tablea where 1 = 1

it seems that both where clauses should be ignored by the optimizer-
or am i missing something

Show quoted text

On Feb 19, 2008 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Postgres User" <postgres.developer@gmail.com> writes:

Yes that works, but the whole point of the exercise is replace many OR
statements with 1 regex expression. So it's not what I'm looking for.

Unfortunately, Postgres is not as intelligent as you are. There is
no mechanism to rewrite a multi-branch regex condition into multiple
indexscans. I recommend going back to the OR's.

regards, tom lane

#13Martijn van Oosterhout
kleptog@svana.org
In reply to: Postgres User (#12)
Re: Regex query not using index

On Wed, Feb 20, 2008 at 12:56:54AM -0800, Postgres User wrote:

a final question: why does this syntax do a seq scan + filter:

select * from tablea where fielda = fielda -or- select * from
tablea where fielda in (fielda)

while this syntax results in no filter, seq scan only

select * from tablea where 1 = 1

it seems that both where clauses should be ignored by the optimizer-
or am i missing something

WHERE fielda = fielda will only match non-null rows...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy

#14Tino Wildenhain
tino@wildenhain.de
In reply to: Postgres User (#7)
Re: Regex query not using index

Postgres User wrote:

im trying to allow the client to pass a varchar param into my
function, and want to avoid any parsing of the parameter inside the
function, or code to build a sql string.

if the function can use this code, it will be compiled and optimized
(unlike a dynamic sql stirng)

select * from mytable where fielda ~ p_param

No, you should never let users specify raw regex. at best they can
hog down your server. Regex is a state engine and you can create
endless loops.

Maybe we can see the overall picture of your query?

Regards
Tino

#15Postgres User
postgres.developer@gmail.com
In reply to: Tino Wildenhain (#14)
Re: Regex query not using index

Tino,

My users are developers and the goal was to accept a simple
comma-delimited list of string values as a function's input parameter.
The function would then parse this input param into a valid regex
expression.

I was trying to write a function that lets me avoid using Execute
<string> and instead write in-line SQL with all the benefits of
pre-compilation and optimization.

Regex offers such a technique- IF it could understand regex that
represented a set of logical ORs and do an index scan (my rule is to
avoid seq-scans)

An example of regex that allows you to use in-line SQL with a
condition equivalent to many OR conditions when using basic comparison
operators:

select * from table1
where name ~ '.*' '^Smith$' |^Jones$':

And this works very well- except for the seq scan instead of an index scan

Show quoted text

On Feb 20, 2008 2:31 AM, Tino Wildenhain <tino@wildenhain.de> wrote:

Postgres User wrote:

im trying to allow the client to pass a varchar param into my
function, and want to avoid any parsing of the parameter inside the
function, or code to build a sql string.

if the function can use this code, it will be compiled and optimized
(unlike a dynamic sql stirng)

select * from mytable where fielda ~ p_param

No, you should never let users specify raw regex. at best they can
hog down your server. Regex is a state engine and you can create
endless loops.

Maybe we can see the overall picture of your query?

Regards
Tino

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Postgres User (#15)
Re: Regex query not using index

"Postgres User" <postgres.developer@gmail.com> writes:

My users are developers and the goal was to accept a simple
comma-delimited list of string values as a function's input parameter.
The function would then parse this input param into a valid regex
expression.

Why are you fixated on this being a regex? If you aren't actually
trying to expose regex capabilities to the users, you'll just be having
to suppress a bunch of strange behaviors for special characters.

ISTM that the best solution is to use an array-of-text parameter,
along the lines of

where name = any (array['Smith', 'Jones', ...])

For what you're doing, you'd not actually want the array[] syntax,
it would look more like

where name = any ('{Smith,Jones}'::text[])

This should optimize into an indexscan in 8.2 or later.

regards, tom lane

#17Postgres User
postgres.developer@gmail.com
In reply to: Tom Lane (#16)
Re: Regex query not using index

Tom,

I was looking for another approach but didn't come across that array
syntax in my searches (perhaps because it's newer. Thanks for a
solution.

Now to end my fixation, one last item. What about the case of a null
or empty param value- is there a way to assign a condition value that
Postgres will ignore when processing the query?

This syntax results in a seq scan: WHERE fielda = Coalesce(param, fielda)
because it applies only to non-nulls

Is there another way to write this- perhaps using your array syntax on
an empty array? Basically I'd PG to ignore the condition just as it
ignores WHERE 1 = 1

Show quoted text

On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Postgres User" <postgres.developer@gmail.com> writes:

My users are developers and the goal was to accept a simple
comma-delimited list of string values as a function's input parameter.
The function would then parse this input param into a valid regex
expression.

Why are you fixated on this being a regex? If you aren't actually
trying to expose regex capabilities to the users, you'll just be having
to suppress a bunch of strange behaviors for special characters.

ISTM that the best solution is to use an array-of-text parameter,
along the lines of

where name = any (array['Smith', 'Jones', ...])

For what you're doing, you'd not actually want the array[] syntax,
it would look more like

where name = any ('{Smith,Jones}'::text[])

This should optimize into an indexscan in 8.2 or later.

regards, tom lane

#18Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Postgres User (#17)
Re: Regex query not using index

On Feb 20, 2008, at 5:51 PM, Postgres User wrote:

Now to end my fixation, one last item. What about the case of a null
or empty param value- is there a way to assign a condition value that
Postgres will ignore when processing the query?

This syntax results in a seq scan: WHERE fielda = Coalesce(param,
fielda)
because it applies only to non-nulls

Is there another way to write this- perhaps using your array syntax on
an empty array? Basically I'd PG to ignore the condition just as it
ignores WHERE 1 = 1

Not sure whether you want no rows returned if param is NULL or all
rows (all rows looking at your example), but you could simply check
param for NULL before comparing it, so either:

WHERE param IS NOT NULL AND fielda = param

or

WHERE param IS NULL OR fielda = param

In the second case, if param IS NULL you will get a sequential scan
of course, as that's the most efficient way to return all rows.

On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Postgres User" <postgres.developer@gmail.com> writes:

My users are developers and the goal was to accept a simple
comma-delimited list of string values as a function's input
parameter.
The function would then parse this input param into a valid regex
expression.

Why are you fixated on this being a regex? If you aren't actually
trying to expose regex capabilities to the users, you'll just be
having
to suppress a bunch of strange behaviors for special characters.

ISTM that the best solution is to use an array-of-text parameter,
along the lines of

where name = any (array['Smith', 'Jones', ...])

For what you're doing, you'd not actually want the array[] syntax,
it would look more like

where name = any ('{Smith,Jones}'::text[])

This should optimize into an indexscan in 8.2 or later.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47c1522f233091890169212!