Wrong plan for simple join with index on FK

Started by Pavel Stehuleover 19 years ago7 messages
#1Pavel Stehule
pavel.stehule@hotmail.com

Hello

I test using index on foreign key. I found situation, when planner choose
worse plan.

create table f1(pk serial primary key);
create table f2(fk integer references f1(pk));

insert into f1 select a from generate_series(1,10000) a;
insert into f2 select (random()*9999)::int+1 from generate_series(1,140000);
vacuum analyze;
create index xxx on f2(fk);
\timing
postgres=> select count(*) from f1 join f2 on pk=fk;
count
--------
140000
(1 row)
Time: 538,254 ms
drop index xxx;
postgres=> select count(*) from f1 join f2 on pk=fk;
count
--------
140000
(1 row)
Time: 311,580 ms

Plans:

postgres=> explain select count(*) from f1 join f2 on pk=fk;
QUERY PLAN
--------------------------------------------------------------------------
Aggregate (cost=7788.00..7788.01 rows=1 width=0)
-> Hash Join (cost=170.00..7438.00 rows=140000 width=0)
Hash Cond: (f2.fk = f1.pk)
-> Seq Scan on f2 (cost=0.00..2018.00 rows=140000 width=4)
-> Hash (cost=145.00..145.00 rows=10000 width=4)
-> Seq Scan on f1 (cost=0.00..145.00 rows=10000 width=4)
(6 rows)
postgres=> explain select count(*) from f1 join f2 on pk=fk;
QUERY PLAN
------------------------------------------------------------------------------------
Aggregate (cost=6631.75..6631.76 rows=1 width=0)
-> Merge Join (cost=0.00..6281.75 rows=140000 width=0)
Merge Cond: (f1.pk = f2.fk)
-> Index Scan using f1_pkey on f1 (cost=0.00..187.00 rows=10000
width=4)
-> Index Scan using xxx on f2 (cost=0.00..4319.77 rows=140000
width=4)
(5 rows)

PostgreSQL 8.1, Linux

Regards
Pavel Stehule

_________________________________________________________________
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com.
http://www.msn.cz/

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Pavel Stehule (#1)
Re: Wrong plan for simple join with index on FK

On Tue, May 16, 2006 at 11:52:05AM +0200, Pavel Stehule wrote:

Hello

I test using index on foreign key. I found situation, when planner choose
worse plan.

Can we seen an EXPLAIN ANALYZE output to see where the miscalculation
lies. Is it underestimating the cost of the index scan, or
overestimating the cost of the hash join.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Pavel Stehule
pavel.stehule@hotmail.com
In reply to: Martijn van Oosterhout (#2)
Re: Wrong plan for simple join with index on FK

Can we seen an EXPLAIN ANALYZE output to see where the miscalculation
lies. Is it underestimating the cost of the index scan, or
overestimating the cost of the hash join.

postgres=> explain analyze select count(*) from f1 join f2 on pk=fk;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=6631.75..6631.76 rows=1 width=0) (actual
time=2433.700..2433.703 rows=1 loops=1)
-> Merge Join (cost=0.00..6281.75 rows=140000 width=0) (actual
time=0.055..1916.815 rows=140000 loops=1)
Merge Cond: (f1.pk = f2.fk)
-> Index Scan using f1_pkey on f1 (cost=0.00..187.00 rows=10000
width=4) (actual time=0.025..45.635 rows=10000 loops=1)
-> Index Scan using xxx on f2 (cost=0.00..4319.77 rows=140000
width=4) (actual time=0.011..812.661 rows=140000 loops=1)
Total runtime: 2433.859 ms
(6 rows)
postgres=> explain analyze select count(*) from f1 join f2 on pk=fk;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7788.00..7788.01 rows=1 width=0) (actual
time=2216.490..2216.493 rows=1 loops=1)
-> Hash Join (cost=170.00..7438.00 rows=140000 width=0) (actual
time=80.296..1712.505 rows=140000 loops=1)
Hash Cond: (f2.fk = f1.pk)
-> Seq Scan on f2 (cost=0.00..2018.00 rows=140000 width=4)
(actual time=0.031..493.614 rows=140000 loops=1)
-> Hash (cost=145.00..145.00 rows=10000 width=4) (actual
time=80.201..80.201 rows=10000 loops=1)
-> Seq Scan on f1 (cost=0.00..145.00 rows=10000 width=4)
(actual time=0.025..37.587 rows=10000 loops=1)
Total runtime: 2216.730 ms
(7 rows)

Regards
Pavel

_________________________________________________________________
Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci.
http://messenger.msn.cz/

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Pavel Stehule (#3)
Re: Wrong plan for simple join with index on FK

On Tue, May 16, 2006 at 12:54:58PM +0200, Pavel Stehule wrote:

Can we seen an EXPLAIN ANALYZE output to see where the miscalculation
lies. Is it underestimating the cost of the index scan, or
overestimating the cost of the hash join.

postgres=> explain analyze select count(*) from f1 join f2 on pk=fk;

First query (merge join):

Apart from the apparent overestimation of the cost of a full index scan
over xxx by about 30%, there seems to be a significant underestimation
of the cost of the merge join.

postgres=> explain analyze select count(*) from f1 join f2 on pk=fk;

Second query (hash join):

Here the estimates seem to be fine, except for an apparent
underestimation of the cost of the aggregate.

These are all minor abberations though, on the whole the estimates are
pretty good. Perhaps you need to tweak the values of random_page_cost
and similar variables.

Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#5Pavel Stehule
pavel.stehule@hotmail.com
In reply to: Martijn van Oosterhout (#4)
Re: Wrong plan for simple join with index on FK

These are all minor abberations though, on the whole the estimates are
pretty good. Perhaps you need to tweak the values of random_page_cost
and similar variables.

Thank You, It's general problem or only mine? I have "100%" standard current
PC.

Pavel

_________________________________________________________________
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/

#6Zeugswetter Andreas DCP SD
ZeugswetterA@spardat.at
In reply to: Pavel Stehule (#5)
Re: Wrong plan for simple join with index on FK

These are all minor abberations though, on the whole the estimates

are

pretty good. Perhaps you need to tweak the values of random_page_cost
and similar variables.

Thank You, It's general problem or only mine? I have "100%"
standard current PC.

The default random_page_cost assumes some concurrent activity. If your
PC does nothing else concurrently, the performance of a seq scan will
be underestimated.

Try to do the statement with some concurrent disk load and you will most
likely
see that the 1. plan is faster. (assuming the tables are not fully
cached)

Andreas

#7Pavel Stehule
pavel.stehule@hotmail.com
In reply to: Zeugswetter Andreas DCP SD (#6)
Re: Wrong plan for simple join with index on FK

These are all minor abberations though, on the whole the estimates

are

pretty good. Perhaps you need to tweak the values of random_page_cost
and similar variables.

Thank You, It's general problem or only mine? I have "100%"
standard current PC.

The default random_page_cost assumes some concurrent activity. If your
PC does nothing else concurrently, the performance of a seq scan will
be underestimated.

Try to do the statement with some concurrent disk load and you will most
likely
see that the 1. plan is faster. (assuming the tables are not fully
cached)

Andreas

ok. I tested it with pgbench and it's true. With -c 50 merge_join is
faster. I didn't expect it.

Thank You
Pavel

_________________________________________________________________
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com.
http://www.msn.cz/