strange explain

Started by Oleg Bartunovover 23 years ago9 messages
#1Oleg Bartunov
oleg@sai.msu.su

Hi,

I've got performance problem and while I'dont ready to describe it
I'd like to ask about strange explain:

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 ) or
( operator_id = 8 and type_id=4 );

NOTICE: QUERY PLAN:

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual time=0.26..0.26 rows=0 loops=1)
Total runtime: 0.45 msec

EXPLAIN

What does many 'type_idx' means ?

Theare are 2 indices - operator_idx and type_idx.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#1)
Re: strange explain

Oleg Bartunov <oleg@sai.msu.su> writes:

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 ) or
( operator_id = 8 and type_id=4 );

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual time=0.26..0.26 rows=0 loops=1)

What does many 'type_idx' means ?

Multiple indexscans.

It looks to me like your WHERE clause is being flattened into

( operator_id = 2 and type_id=2 ) or
( operator_id = 3 and type_id=2 ) or
( operator_id = 4 and type_id=2 ) or
( operator_id = 5 and type_id=2 ) or
( operator_id = 7 and type_id=2 ) or
( operator_id = 8 and type_id=4 )

and then it has a choice of repeated indexscans on operator_id or
type_id. Depending on the selectivity stats it might pick either.
You might find that a 2-column index on both would be a win.

regards, tom lane

#3Rod Taylor
rbt@zort.ca
In reply to: Oleg Bartunov (#1)
Re: strange explain

It appears it scanes the type_idx once per opereator. IN gets broken
down into ORs

Is this what the TODO entry 'Make IN / NOT IN have similar performance
as EXISTS' means?
--
Rod
----- Original Message -----
From: "Oleg Bartunov" <oleg@sai.msu.su>
To: "Pgsql Hackers" <pgsql-hackers@postgresql.org>; "Tom Lane"
<tgl@sss.pgh.pa.us>
Sent: Monday, May 13, 2002 9:42 AM
Subject: [HACKERS] strange explain

Hi,

I've got performance problem and while I'dont ready to describe it
I'd like to ask about strange explain:

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 ) or
( operator_id = 8 and type_id=4 );

NOTICE: QUERY PLAN:

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx,

type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual
time=0.26..0.26 rows=0 loops=1)

Total runtime: 0.45 msec

EXPLAIN

What does many 'type_idx' means ?

Theare are 2 indices - operator_idx and type_idx.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of

broadcast)---------------------------

TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

Show quoted text
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rod Taylor (#3)
Re: strange explain

"Rod Taylor" <rbt@zort.ca> writes:

Is this what the TODO entry 'Make IN / NOT IN have similar performance
as EXISTS' means?

No. The TODO item is talking about IN with a sub-SELECT, which is not
optimized at all at the moment. IN with a list of scalar values is
converted to ((x = value1) OR (x = value2) OR ...), which we can do
something with.

regards, tom lane

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#2)
Re: strange explain

Thanks Tom,

On Mon, 13 May 2002, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 ) or
( operator_id = 8 and type_id=4 );

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual time=0.26..0.26 rows=0 loops=1)

What does many 'type_idx' means ?

Multiple indexscans.

It looks to me like your WHERE clause is being flattened into

( operator_id = 2 and type_id=2 ) or
( operator_id = 3 and type_id=2 ) or
( operator_id = 4 and type_id=2 ) or
( operator_id = 5 and type_id=2 ) or
( operator_id = 7 and type_id=2 ) or
( operator_id = 8 and type_id=4 )

this is what I assume.

and then it has a choice of repeated indexscans on operator_id or
type_id. Depending on the selectivity stats it might pick either.
You might find that a 2-column index on both would be a win.

Yes, we've went exactly this way.

I'm very exited how planner could be smart. When I played with the query
and specify different values of type_id I notice it's chose plans depends
on is value exists or not.

regards, tom lane

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#6Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#2)
Re: strange explain

Tom,

one more question.

What's the difference for planner between 2 queries ?
For the first query I have plain index scan, but multiple
index scan for second.

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 );
NOTICE: QUERY PLAN:

Index Scan using type_idx on tours (cost=0.00..2.03 rows=1 width=1091) (actual time=0.03..0.03 rows=0 loops=1)
Total runtime: 0.16 msec

EXPLAIN
tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 4 ) or
( operator_id = 8 and type_id = 3);
NOTICE: QUERY PLAN:

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual time=0.27..0.27 rows=0 loops=1)
Total runtime: 0.44 msec

EXPLAIN

On Mon, 13 May 2002, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 ) or
( operator_id = 8 and type_id=4 );

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual time=0.26..0.26 rows=0 loops=1)

What does many 'type_idx' means ?

Multiple indexscans.

It looks to me like your WHERE clause is being flattened into

( operator_id = 2 and type_id=2 ) or
( operator_id = 3 and type_id=2 ) or
( operator_id = 4 and type_id=2 ) or
( operator_id = 5 and type_id=2 ) or
( operator_id = 7 and type_id=2 ) or
( operator_id = 8 and type_id=4 )

and then it has a choice of repeated indexscans on operator_id or
type_id. Depending on the selectivity stats it might pick either.
You might find that a 2-column index on both would be a win.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#6)
Re: strange explain

Oleg Bartunov <oleg@sai.msu.su> writes:

What's the difference for planner between 2 queries ?

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 2 );

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 4 ) or
( operator_id = 8 and type_id = 3);

The first one's already in normal form and doesn't need any more
flattening. I believe the system will consider a multiple indexscan
on operator_idx for it, but probably the cost estimator is concluding
that that's a loser compared to one indexscan using type_id = 2.
Without any info on the selectivity of these conditions it's hard to say
whether that's a correct choice or not.

regards, tom lane

#8Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#7)
Re: strange explain

EXPLAIN
tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 4 ) or
( operator_id = 8 and type_id = 3);
NOTICE: QUERY PLAN:

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=>
0.00..12.25 rows=1 width=1091) (actual time=0.27..0.27 rows=0 loops=1)
Total runtime: 0.44 msec

Actually this plan looks very strange to me. One would expect it to only use
type_idx twice (in lack of a better index (type_id, operator_id)).
Fetch all rows with type_id=4 and then filter the result on operator_id in (...),
then do type_id=3 and filter operator_id=8.
Seems there is room for another performance improvement here :-)

Andreas

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#8)
Re: strange explain

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

tour=# explain analyze select * from tours where
( operator_id in (2,3,4,5,7) and type_id = 4 ) or
( operator_id = 8 and type_id = 3);

Actually this plan looks very strange to me. One would expect it to only use
type_idx twice (in lack of a better index (type_id, operator_id)).
Seems there is room for another performance improvement here :-)

Yeah, this demonstrates that reducing the quals to canonical form isn't
always the best thing to do.

Or maybe we could just look for duplicate indexqual conditions at the
end of the process?

regards, tom lane