Index not used on select (Is this more OR + LIKE?)
I've been reading Ole's posts with great interest
as we've have just experienced similar problems
ourselves here. I opened up a Slip ;-) #5 but
I've narrowed down the difference between two
queries which illustrate the problem:
The first query:
SELECT DISTINCT
supplies.supply,supplies.supplyunit,
supplies.purchaseunit,supplies.vendor,
supplies.vendorgroup,supplies.vendoritem,
supplies.vendorname,supplies.description,
supplies.conversion,supplies.price,
supplies.inventory,supplies.commodity,
supplies.adddate
FROM
supplies,permitbuy,locations,supplychains,reserves
WHERE
permitbuy.webuser = 'mascarj' AND
(locations.company,locations.costcntr) =
(permitbuy.company, permitbuy.costcntr) AND
supplychains.target = locations.target AND
reserves.target = supplychains.supplysource AND
supplies.supply = reserves.supply AND
supplies.inventory = '1' AND
((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%'))
ORDER BY
supplies.description;
The EXPLAIN shows its using indices as it should:
NOTICE: QUERY PLAN:
Unique (cost=24076.77 rows=8260854 width=220)
-> Sort (cost=24076.77 rows=8260854 width=220)
-> Hash Join (cost=24076.77 rows=8260854
width=220)
-> Hash Join (cost=1756.00 rows=597537
width=76)
-> Seq Scan on reserves
(cost=938.44 rows=20468 width=16)
-> Hash (cost=121.44 rows=475
width=60)
-> Hash Join (cost=121.44
rows=475 width=60)
-> Seq Scan on
supplychains (cost=49.28 rows=1251 width=8)
-> Hash (cost=26.80
rows=93 width=52)
-> Hash Join
(cost=26.80 rows=93 width=52)
-> Seq
Scan on locations (cost=10.09 rows=245 width=28)
-> Hash
(cost=5.78 rows=56 width=24)
->
Index Scan using k_permitbuy1 on permitbuy (cost=5.78
rows=56 width=24)
-> Hash (cost=1675.03 rows=17637
width=144)
-> Seq Scan on supplies
(cost=1675.03 rows=17637 width=144)
EXPLAIN
This query works as expected and returns within
a reasonable amount of time. However, if an OR
clause is introduced as below:
SELECT DISTINCT
supplies.supply,supplies.supplyunit,
supplies.purchaseunit,supplies.vendor,
supplies.vendorgroup,supplies.vendoritem,
supplies.vendorname,supplies.description,
supplies.conversion,supplies.price,
supplies.inventory,supplies.commodity,
supplies.adddate
FROM
supplies,permitbuy,locations,supplychains,reserves
WHERE
permitbuy.webuser = 'mascarj' AND
(locations.company,locations.costcntr) =
(permitbuy.company, permitbuy.costcntr) AND
supplychains.target = locations.target AND
reserves.target = supplychains.supplysource AND
supplies.supply = reserves.supply AND
supplies.inventory = '1' AND
((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%'))
OR <-- This is built by our search engine to allow
-- our users to enter: [SEQ or SCD]...
((upper(supplies.supply) LIKE '%SCD%') OR
(upper(supplies.vendoritem) LIKE '%SCD%') OR
(upper(supplies.vendorname) LIKE '%SCD%') OR
(upper(supplies.description) LIKE '%SCD%'))
ORDER BY
supplies.description;
The EXPLAIN shows that it doesn't bother to use
the indices for ANY of the joins:
NOTICE: QUERY PLAN:
Unique (cost=63290466304.00 rows=1073741850
width=232)
-> Sort (cost=63290466304.00 rows=1073741850
width=232)
-> Nested Loop (cost=63290466304.00
rows=1073741850 width=232)
-> Nested Loop (cost=52461780992.00
rows=1073741851 width=204)
-> Nested Loop
(cost=28277893120.00 rows=1073741851 width=168)
-> Nested Loop
(cost=28033934.00 rows=573217107 width=160)
-> Seq Scan on supplies
(cost=1675.03 rows=29871 width=144)
-> Seq Scan on
reserves (cost=938.44 rows=20468 width=16)
-> Seq Scan on supplychains
(cost=49.28 rows=1251 width=8) ->
Seq Scan on permitbuy (cost=22.52 rows=531 width=36)
-> Seq Scan on locations (cost=10.09
rows=245 width=28)
EXPLAIN
The plan shows that it will have to perform a
sequential scan on the supplies table, which I
obviously expected because of the use of LIKE, in
both plans. However, why is it, that, when an
OR clause which exclusively references the supplies
table is appended to the query, the planner/optimizer
(which already must perform a sequential scan on
supplies) now totally ignores all the indices
built on the other tables? The result is an
execution plan which consumes all RAM on the machine,
and, at 410M, I killed it, because it was about to
consume all swap space as well...
Any help would be greatly appreciated
Mike Mascari (mascarim@yahoo.com)
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Mike Mascari <mascarim@yahoo.com> writes:
... However, if an OR clause is introduced as below:
WHERE
permitbuy.webuser = 'mascarj' AND
(locations.company,locations.costcntr) =
(permitbuy.company, permitbuy.costcntr) AND
supplychains.target = locations.target AND
reserves.target = supplychains.supplysource AND
supplies.supply = reserves.supply AND
supplies.inventory = '1' AND
((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%'))
OR <-- This is built by our search engine to allow
-- our users to enter: [SEQ or SCD]...
((upper(supplies.supply) LIKE '%SCD%') OR
(upper(supplies.vendoritem) LIKE '%SCD%') OR
(upper(supplies.vendorname) LIKE '%SCD%') OR
(upper(supplies.description) LIKE '%SCD%'))
The plan shows that it will have to perform a
sequential scan on the supplies table, which I
obviously expected because of the use of LIKE, in
both plans.
Not necessarily --- since you have a restriction clause on
supplies.inventory, an index on that field could be used for an index
scan. This would only be worthwhile if "supplies.inventory = '1'"
eliminates a goodly fraction of the supplies records, of course.
Another possibility is using an index on supplies.supply to implement
the join on supplies.supply = reserves.supply. The LIKE clauses will
certainly have to be done the hard way, but they don't necessarily
have to be done the hard way on every single record.
However, why is it, that, when an OR clause which exclusively
references the supplies table is appended to the query, the
planner/optimizer (which already must perform a sequential scan on
supplies) now totally ignores all the indices built on the other
tables?
I think the problem is that the OR appears at top level in the WHERE
clause (assuming the above is a verbatim transcript of your query).
OR groups less tightly than AND, so what this really means is
(other-conditions AND (LIKEs-for-SEQ)) OR (LIKEs-for-SCD)
which is undoubtedly not what you had in mind, and will certainly
produce a lot of unwanted records if the query manages to complete.
Every supplies tuple matching SCD will appear joined to every possible
combination of records from the other tables...
Per recent discussions, the query optimizer is currently doing a really
bad job of optimizing OR-of-ANDs conditions, but I think that you didn't
mean to ask for that anyway.
The result is an execution plan which consumes all RAM on the machine,
and, at 410M, I killed it, because it was about to consume all swap
space as well...
You're confusing two different problems --- the efficiency of the query
plan has a lot to do with speed, but relatively little to do with memory
usage. I think that the memory usage problem here stems from the use of
upper() in the WHERE condition. Each evaluation of upper() generates a
temporary result string, which is not reclaimed until end of statement
in the current code. (I hope to see that fixed in the next release or
two, but for now you gotta work around it.) You would be better advised
to use a case-insensitive match operator instead of LIKE. For example,
the above conditions could be written
supplies.supply ~* 'SEQ'
Dunno how inconvenient it is for you to use regular expression patterns
instead of LIKE-style patterns, but the memory savings would be
considerable. Even after we fix the memory leakage problem, I expect
this would be faster than the LIKE version.
BTW, the reason that your correctly-phrased query doesn't run out
of memory is that the LIKE conditions don't get evaluated for the
tuples that don't make it past the other qual conditions. In the
mistaken version, they get evaluated for every possible combination
of joined tuples...
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofFri23Jul1999121941-040019990723161942.21461.rocketmail@send205.yahoomail.com | Resolved by subject fallback
Thanks Tom,
You were correct, as always. The query was wrong.
I apologize for wasting your time. I needed to
encase the chained OR clauses in parens...I'll try
and purge that Slip.
Sorry,
Mike Mascari
mascarim@yahoo.com
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike Mascari <mascarim@yahoo.com> writes:
... However, if an OR clause is introduced as
below:
I think the problem is that the OR appears at top
level in the WHERE
clause (assuming the above is a verbatim transcript
of your query).
OR groups less tightly than AND, so what this really
means is
(other-conditions AND (LIKEs-for-SEQ)) OR
(LIKEs-for-SCD)
which is undoubtedly not what you had in mind, and
will certainly
produce a lot of unwanted records if the query
manages to complete.
Every supplies tuple matching SCD will appear joined
to every possible
combination of records from the other tables...In
the
mistaken version, they get evaluated for every
possible combination
of joined tuples...regards, tom lane
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Import Notes
Resolved by subject fallback