BUG #6787: Query planner estimates worng costs on nodes

Started by Giorgio Savianealmost 14 years ago2 messagesbugs

gsaviane@gmail.com

almost 14 years ago

The following bug has been logged on the website:

Bug reference: 6787
Logged by: Giorgio Saviane
Email address: gsaviane@gmail.com
PostgreSQL version: 8.4.12
Operating system: Linux
Description:

Hi, I would submit the explain analayze results of a query with and without
enable_seqscan enabled.

This is the query:

select count(*)
from vb_messages msg
INNER JOIN vb_readings r ON msg.vb_reading_id = r.id
INNER JOIN vb_product_readings pr ON pr.vb_reading_id = r.id
where msg.reception_date >= '2012-07-01'
AND msg.reception_date < '2012-07-02'

The explain analyze result with set enable_seqscan=false is:

'Aggregate (cost=17228990.83..17228990.84 rows=1 width=0) (actual
time=1847.380..1847.380 rows=1 loops=1)'
' -> Nested Loop (cost=0.00..17228943.48 rows=18939 width=0) (actual
time=0.064..1567.371 rows=392336 loops=1)'
' -> Nested Loop (cost=0.00..396208.94 rows=11176 width=8) (actual
time=0.049..494.031 rows=35761 loops=1)'
' -> Index Scan using ix_vb_messages_reception_date on
vb_messages msg (cost=0.00..94347.00 rows=81650 width=4) (actual
time=0.024..94.808 rows=101522 loops=1)'
' Index Cond: ((reception_date >= '2012-07-01
00:00:00'::timestamp without time zone) AND (reception_date < '2012-07-02
00:00:00'::timestamp without time zone))'
' -> Index Scan using vb_readings_pkey on vb_readings r
(cost=0.00..3.68 rows=1 width=4) (actual time=0.002..0.002 rows=0
loops=101522)'
' Index Cond: (r.id = msg.vb_reading_id)'
' -> Index Scan using ix_vb_product_readings_vb_reading_id on
vb_product_readings pr (cost=0.00..1490.53 rows=1250 width=4) (actual
time=0.004..0.013 rows=11 loops=35761)'
' Index Cond: (pr.vb_reading_id = msg.vb_reading_id)'
'Total runtime: 1847.439 ms'

The explain analyze result with set enable_seqscan=true is:

'Aggregate (cost=3301249.86..3301249.87 rows=1 width=0) (actual
time=258887.238..258887.239 rows=1 loops=1)'
' -> Hash Join (cost=396345.01..3301202.51 rows=18939 width=0) (actual
time=18401.256..258561.539 rows=392336 loops=1)'
' Hash Cond: (pr.vb_reading_id = msg.vb_reading_id)'
' -> Seq Scan on vb_product_readings pr (cost=0.00..2443541.36
rows=122967136 width=4) (actual time=0.102..130182.321 rows=141509801
loops=1)'
' -> Hash (cost=396205.31..396205.31 rows=11176 width=8) (actual
time=592.774..592.774 rows=35761 loops=1)'
' -> Nested Loop (cost=0.00..396205.31 rows=11176 width=8)
(actual time=0.160..556.412 rows=35761 loops=1)'
' -> Index Scan using ix_vb_messages_reception_date on
vb_messages msg (cost=0.00..94347.00 rows=81650 width=4) (actual
time=0.068..113.526 rows=101522 loops=1)'
' Index Cond: ((reception_date >= '2012-07-01
00:00:00'::timestamp without time zone) AND (reception_date < '2012-07-02
00:00:00'::timestamp without time zone))'
' -> Index Scan using vb_readings_pkey on vb_readings r
(cost=0.00..3.68 rows=1 width=4) (actual time=0.002..0.002 rows=0
loops=101522)'
' Index Cond: (r.id = msg.vb_reading_id)'
'Total runtime: 258887.596 ms'

As you can see the table are all indexed on their foreign keys and fresh
analyzed.

I wonder why the planner estimates a very much lower cost with sequential
scans enabled but the actual execution time is very much higher than that
with sequential scans disabled.
It seems the cost assigned to a sequential scan over 120 million records is
too low or, on the other side, the cost assigned to two nested loop over
three indexes is too high.

Tom Lane

tgl@sss.pgh.pa.us

almost 14 years ago

In reply to: Giorgio Saviane (#1)

Re: BUG #6787: Query planner estimates worng costs on nodes

gsaviane@gmail.com writes:

It seems the cost assigned to a sequential scan over 120 million records is
too low or, on the other side, the cost assigned to two nested loop over
three indexes is too high.

It looks from here like you are measuring a situation where
vb_product_readings is fully cached in RAM. If that's the normal
situation in your database, then you ought to decrease the
random_page_cost setting to make the planner's cost estimates reflect
your reality. The default setting is intended to model situations
where a fair amount of disk access is going to happen.

regards, tom lane