VERY strange query plan (LONG)

Started by Oleg Bartunovover 25 years ago4 messages
#1Oleg Bartunov
oleg@sai.msu.su

Hi,

I tried to implement fulltext search using linguistic approach,
for example, using ispell like udmsearch does. We also save position
information of each lexem in document to calculate relevancy
(it's C-function using SPI-interface). We're still testing different
strategies but found several problems with optimizer, just look at plan -
very strange numbers and no indices used) (I did run vacuume analyze)

explain
select
txt.tid
from
txt, txt_lexem1 tl1_0, txt_lexem11 tl11_0
where
tl1_0.lid =17700
OR
tl11_0.lid =172751
;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..16275952180.00 rows=512819420786 width=12)
-> Nested Loop (cost=0.00..891369556.42 rows=512819421 width=8)
-> Seq Scan on txt_lexem11 tl11_0 (cost=0.00..2596.92 rows=132292 width=4)
-> Seq Scan on txt_lexem1 tl1_0 (cost=0.00..3815.95 rows=194795 width=4)
-> Seq Scan on txt (cost=0.00..20.00 rows=1000 width=4)

EXPLAIN

fulltext=# \d txt
          Table "txt"
 Attribute |  Type   | Modifier 
-----------+---------+----------
 tid       | integer | not null
Index: txt_pkey

tables txt_lexemX look like:

fulltext=# \d txt_lexem1
        Table "txt_lexem1"
 Attribute |   Type    | Modifier 
-----------+-----------+----------
 tid       | integer   | not null
 lid       | integer   | not null
 did       | integer   | not null
 count     | integer   | not null
 pos       | integer[] | not null
Index: txt_lexem1_key

We have rewrite using EXISTS and plan looks better !

select
txt.tid
from
txt
where
EXISTS ( select tid from txt_lexem1 tl1_0 where tl1_0.lid=17700 and tl1_0.did=0
and txt.tid=tl1_0.tid )
OR
EXISTS ( select tid from txt_lexem11 tl11_0 where tl11_0.lid=172751 and
tl11_0.did=0 and txt.tid=tl11_0.tid )
;

NOTICE: QUERY PLAN:

Seq Scan on txt (cost=0.00..7416.48 rows=1000 width=4)
SubPlan
-> Index Scan using txt_lexem1_key on txt_lexem1 tl1_0 (cost=0.00..3.95 rows=1 width=4)
-> Index Scan using txt_lexem11_key on txt_lexem11 tl11_0 (cost=0.00..3.45 rows=1 width=4)

EXPLAIN

I've tested on plain 7.0.2 and CVS version.
I remind there was old problem with OR. Does optimizer still has
such problem ?

Regards,

Oleg

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#2Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Oleg Bartunov (#1)
AW: VERY strange query plan (LONG)

very strange numbers and no indices used) (I did run vacuume analyze)

explain
select
txt.tid
from
txt, txt_lexem1 tl1_0, txt_lexem11 tl11_0
where
tl1_0.lid =17700
OR
tl11_0.lid =172751
;
NOTICE: QUERY PLAN:

Did you forget to join the tids together, and the did=0 restrictions ?

Your statement looks very strange (cartesian product), and has nothing in
common with the subselect statements you quoted.

Andreas

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Zeugswetter Andreas SB (#2)
Re: AW: VERY strange query plan (LONG)

On Thu, 10 Aug 2000, Zeugswetter Andreas SB wrote:

Date: Thu, 10 Aug 2000 10:14:42 +0200
From: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
To: 'Oleg Bartunov' <oleg@sai.msu.su>
Cc: "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>
Subject: AW: [HACKERS] VERY strange query plan (LONG)

very strange numbers and no indices used) (I did run vacuume analyze)

explain
select
txt.tid
from
txt, txt_lexem1 tl1_0, txt_lexem11 tl11_0
where
tl1_0.lid =17700
OR
tl11_0.lid =172751
;
NOTICE: QUERY PLAN:

Did you forget to join the tids together, and the did=0 restrictions ?

Your statement looks very strange (cartesian product), and has nothing in
common with the subselect statements you quoted.

You're right, I simplified original query just to show plans.
Here is original query:
explain
select
txt.tid,
tl1_0.count, tl1_0.pos[1] as pos
from
txt, txt_lexem1 tl1_0, txt_lexem11 tl11_0
where
(
( tl1_0.lid in (17700) and tl1_0.did=0 and txt.tid=tl1_0.tid )
OR
( tl11_0.lid in (172751) and tl11_0.did=0 and txt.tid=tl11_0.tid ))

order by count desc, pos asc;

and plan:

NOTICE: QUERY PLAN:

Sort (cost=1278139131.36..1278139131.36 rows=1 width=44)
-> Nested Loop (cost=0.00..1278139131.35 rows=1 width=44)
-> Nested Loop (cost=0.00..1277916858.52 rows=4041 width=40)
-> Seq Scan on txt_lexem11 tl11_0 (cost=0.00..2596.92 rows=132292 width=12)
-> Seq Scan on txt_lexem1 tl1_0 (cost=0.00..3815.95 rows=194795 width=28)
-> Seq Scan on txt (cost=0.00..20.00 rows=1000 width=4)

EXPLAIN

Interesthing that plan for AND looks realistic (and uses indices):
explain
select
txt.tid,
tl1_0.count, tl1_0.pos[1] as pos
from
txt, txt_lexem1 tl1_0, txt_lexem11 tl11_0
where
(
( tl1_0.lid in (17700) and tl1_0.did=0 and txt.tid=tl1_0.tid )
AND
( tl11_0.lid in (172751) and tl11_0.did=0 and txt.tid=tl11_0.tid ))

order by count desc, pos asc;
NOTICE: QUERY PLAN:

Sort (cost=109.05..109.05 rows=1 width=28)
-> Nested Loop (cost=0.00..109.04 rows=1 width=28)
-> Nested Loop (cost=0.00..87.69 rows=3 width=24)
-> Index Scan using txt_lexem11_key on txt_lexem11 tl11_0 (cost=0.00..35.23 rows=13 width=4)
-> Index Scan using txt_lexem1_key on txt_lexem1 tl1_0 (cost=0.00..3.95 rows=1 width=20)
-> Index Scan using txt_pkey on txt (cost=0.00..8.14 rows=10 width=4)

EXPLAIN

We could live with fulltext search using only AND but very strange
plan for OR worry me.

Regards,

Oleg

Andreas

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#4Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Oleg Bartunov (#3)
AW: AW: VERY strange query plan (LONG)

Your statement looks very strange (cartesian product), and

has nothing in

common with the subselect statements you quoted.

You're right, I simplified original query just to show plans.
Here is original query:
explain
select
txt.tid,
tl1_0.count, tl1_0.pos[1] as pos
from
txt, txt_lexem1 tl1_0, txt_lexem11 tl11_0
where
(
( tl1_0.lid in (17700) and tl1_0.did=0 and txt.tid=tl1_0.tid )
OR
( tl11_0.lid in (172751) and tl11_0.did=0 and txt.tid=tl11_0.tid ))

order by count desc, pos asc;

That still does not lead to the same result as your subselect.
Looks like the subselect is really what you want in the first place.
The problem with above is that for the two or'ed clauses there
is no restriction for the respective 3rd table, thus still producing
a cartesian product (the and'ed clauses wont produce that,
thus correct plan for and).

A little better, but still not same result would be:
where
txt.tid=tl1_0.tid and txt.tid=tl11_0.tid and
(( tl1_0.lid in (17700) and tl1_0.did=0)
OR
( tl11_0.lid in (172751) and tl11_0.did=0))

Andreas