Why search term results different query plan?

Started by Erol Özover 24 years ago5 messagesgeneral

eroloz@esg.com.tr

over 24 years ago

Hi,
Could anybody give me hint, or reccomend a source about this:
Please note that two queries are different only in search terms (STAR and
A). Plan and performane difference between them confused me.
Thanks,
Erol

trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p,
product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE
'%STAR%') order by p.name;
NOTICE: QUERY PLAN:

Unique (cost=1599.50..1599.58 rows=1 width=98)
-> Sort (cost=1599.50..1599.50 rows=2 width=98)
-> Nested Loop (cost=0.00..1599.48 rows=2 width=98)
-> Seq Scan on product_t p (cost=0.00..613.41 rows=1
width=94)
-> Seq Scan on product_detail_t pd (cost=0.00..983.19
rows=231 width=4)

[postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM
product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE
'%STAR%') order by p.name;" -d trollandtoad2
real 4m24.500s
user 0m0.020s
sys 0m0.010s

------------------------------------------
EXPLAIN
trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p,
product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%')
order by p.name;
NOTICE: QUERY PLAN:

Unique (cost=4456.37..5127.88 rows=1919 width=98)
-> Sort (cost=4456.37..4456.37 rows=19186 width=98)
-> Merge Join (cost=2389.21..2496.11 rows=19186 width=98)
-> Sort (cost=1396.97..1396.97 rows=8321 width=94)
-> Seq Scan on product_t p (cost=0.00..613.41
rows=8321 width=94)
-> Sort (cost=992.24..992.24 rows=231 width=4)
-> Seq Scan on product_detail_t pd (cost=0.00..983.19
rows=231 width=4)

EXPLAIN

[postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM
product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%')
order by p.name;" -d trollandtoad2
real 0m6.284s
user 0m0.270s
sys 0m0.030s

Martijn van Oosterhout

kleptog@svana.org

over 24 years ago

In reply to: Erol Öz (#1)

Re: Why search term results different query plan?

On Sun, Sep 30, 2001 at 04:18:53AM +0300, Erol ?z wrote:

Hi,
Could anybody give me hint, or reccomend a source about this:
Please note that two queries are different only in search terms (STAR and
A). Plan and performane difference between them confused me.
Thanks,
Erol

%STAR%:

Unique (cost=1599.50..1599.58 rows=1 width=98)
-> Sort (cost=1599.50..1599.50 rows=2 width=98)
-> Nested Loop (cost=0.00..1599.48 rows=2 width=98)
-> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94)
-> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4)

%A%:

Unique (cost=4456.37..5127.88 rows=1919 width=98)
-> Sort (cost=4456.37..4456.37 rows=19186 width=98)
-> Merge Join (cost=2389.21..2496.11 rows=19186 width=98)
-> Sort (cost=1396.97..1396.97 rows=8321 width=94)
-> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94)
-> Sort (cost=992.24..992.24 rows=231 width=4)
-> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4)

Note how in the first one it assumes that there is only one match in p
whereas in the second there are 8321. Are either of those correct?

I don't really see how you can do any real estimates on %X% type queries,
although maybe it's assuming longer string => less matches.

Not sure how to fix it though...

HTH,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Show quoted text

Magnetism, electricity and motion are like a three-for-two special offer:
if you have two of them, the third one comes free.

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Erol Öz (#1)

Re: Why search term results different query plan?

"=?iso-8859-9?B?RXJvbCDWeg==?=" <eroloz@esg.com.tr> writes:

Could anybody give me hint, or reccomend a source about this:
Please note that two queries are different only in search terms (STAR and
A). Plan and performane difference between them confused me.

Note the difference in the estimated number of rows matched in
product_t:

-> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94)

-> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94)

The planner thinks --- not unreasonably, IMHO --- that LIKE '%A%' will
match many more rows than LIKE '%STAR%'. Accordingly, it prepares
different plans for the two cases.

Since you're complaining, I suppose that LIKE '%STAR%' matches many more
rows than random chance would suggest, and so that plan turns out to be
poorly chosen.

regards, tom lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Martijn van Oosterhout (#2)

Re: Why search term results different query plan?

Martijn van Oosterhout <kleptog@svana.org> writes:

I don't really see how you can do any real estimates on %X% type queries,
although maybe it's assuming longer string => less matches.

More exactly, it's assuming more fixed characters in the pattern ->
less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c.
While the specific numbers it's using are made from whole cloth, I think
the principle should hold good.

I don't see any way to accumulate actual statistics that would improve
the estimate, do you?

regards, tom lane

Martijn van Oosterhout

kleptog@svana.org

over 24 years ago

In reply to: Tom Lane (#4)

Re: Why search term results different query plan?

On Sat, Sep 29, 2001 at 10:37:24PM -0400, Tom Lane wrote:

Martijn van Oosterhout <kleptog@svana.org> writes:

I don't really see how you can do any real estimates on %X% type queries,
although maybe it's assuming longer string => less matches.

More exactly, it's assuming more fixed characters in the pattern ->
less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c.
While the specific numbers it's using are made from whole cloth, I think
the principle should hold good.

I don't see any way to accumulate actual statistics that would improve
the estimate, do you?

It is difficult, although obviously the results he is getting are bunk. The
only problem I can see is that it doesn't appear to be using any of the
gathered statistics at all. For example, if %STAR% matched the most common
value in the column, the selectivity would still be very low.

What I find most interesting about his case is that the query with %A% ran
40 times faster (260s to 6s) than the one where it thought only 1 row in p
was going to match. What that tells me is that the plan used for %A% is more
appropriate even though %STAR% is more selective.

How can you convince the planner of that?

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Show quoted text

Magnetism, electricity and motion are like a three-for-two special offer:
if you have two of them, the third one comes free.