-HEAD planner issue wrt hash_joins on dbt3 ?

Started by Stefan Kaltenbrunnerover 19 years ago37 messageshackers

stefan@kaltenbrunner.cc

over 19 years ago

Hi!

I'm about to do some benchmarking on -HEAD one some hardware I have
available and it seems I'm hitting a rather weird issue causing the osdl
dbt3 benchmark to run very slow and eating CPU time for hours ...

it seems that the issue is caused by the following query:
(in case it gets linewrapped:
http://www.kaltenbrunner.cc/files/dbt3_with_hashjoin.txt)

select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from
customer, orders, lineitem, supplier, nation, region where c_custkey =
o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and
c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey
= r_regionkey and r_name = 'AFRICA' and o_orderdate >= date '1993-01-01'
and o_orderdate < date '1993-01-01' + interval '1 year' group by n_name
order by revenue desc;

that results in the following plan on my box:

Sort (cost=2543391.75..2543391.81 rows=25 width=37)
Sort Key: sum((lineitem.l_extendedprice * (1::double precision -
lineitem.l_discount)))
-> HashAggregate (cost=2543390.73..2543391.17 rows=25 width=37)
-> Hash Join (cost=440864.81..2543027.40 rows=72666 width=37)
Hash Cond: ((orders.o_custkey = customer.c_custkey) AND
(supplier.s_nationkey = customer.c_nationkey))
-> Hash Join (cost=377714.59..2415568.01 rows=1816643
width=49)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Nested Loop (cost=13.65..1719683.85
rows=12000672 width=49)
-> Merge Join (cost=0.00..10248.66
rows=20000 width=41)
Merge Cond: (nation.n_nationkey =
supplier.s_nationkey)
-> Nested Loop (cost=0.00..19.19
rows=5 width=33)
-> Index Scan using pk_nation on
nation (cost=0.00..9.38 rows=25 width=37)
-> Index Scan using pk_region on
region (cost=0.00..0.38 rows=1 width=4)
Index Cond:
(nation.n_regionkey = region.r_regionkey)
Filter: (r_name =
'AFRICA'::bpchar)
-> Index Scan using i_s_nationkey on
supplier (cost=0.00..9779.46 rows=100000 width=8)
-> Bitmap Heap Scan on lineitem
(cost=13.65..77.16 rows=665 width=16)
Recheck Cond: (lineitem.l_suppkey =
supplier.s_suppkey)
-> Bitmap Index Scan on i_l_suppkey
(cost=0.00..13.65 rows=665 width=0)
Index Cond: (lineitem.l_suppkey =
supplier.s_suppkey)
-> Hash (cost=372023.51..372023.51 rows=2270971
width=8)
-> Bitmap Heap Scan on orders
(cost=41391.94..372023.51 rows=2270971 width=8)
Recheck Cond: ((o_orderdate >=
'1993-01-01'::date) AND (o_orderdate < '1994-01-01 00:00:00'::timestamp
without time zone))
-> Bitmap Index Scan on i_o_orderdate
(cost=0.00..41391.94 rows=2270971 width=0)
Index Cond: ((o_orderdate >=
'1993-01-01'::date) AND (o_orderdate < '1994-01-01 00:00:00'::timestamp
without time zone))
-> Hash (cost=55647.15..55647.15 rows=1500615 width=8)
-> Seq Scan on customer (cost=0.00..55647.15
rows=1500615 width=8)
(27 rows)

so it really thinks that doing hashes with gigantic amounts of data is
a good idea generally - this seems to be independent on work_mem - the
plan looks the same with 1MB vs 126MB(which I had during the run).

the profile of the backend eating the cpu looks similiar to:

26351 27.9047 ExecScanHashBucket
8239 8.7248 hash_seq_search
6984 7.3958 hash_search_with_hash_value

setting hash_join to off results in a runtime of about 2,5minutes:

(http://www.kaltenbrunner.cc/files/dbt3_without_hashjoin.txt)

Sort (cost=3700257.38..3700257.45 rows=25 width=37) (actual
time=286820.962..286820.968 rows=5 loops=1)
Sort Key: sum((lineitem.l_extendedprice * (1::double precision -
lineitem.l_discount)))
-> HashAggregate (cost=3700256.37..3700256.80 rows=25 width=37)
(actual time=286820.932..286820.941 rows=5 loops=1)
-> Nested Loop (cost=730956.43..3699893.04 rows=72666
width=37) (actual time=43551.767..286488.555 rows=72441 loops=1)
Join Filter: (customer.c_nationkey = supplier.s_nationkey)
-> Merge Join (cost=730956.43..3624153.73 rows=1816643
width=49) (actual time=43281.710..257082.739 rows=1822547 loops=1)
Merge Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Index Scan using i_l_orderkey on lineitem
(cost=0.00..2715943.34 rows=60003360 width=16) (actual
time=32.868..123668.380 rows=59991868 loops=1)
-> Sort (cost=730956.43..732091.92 rows=454194
width=41) (actual time=43248.797..45754.223 rows=1822547 loops=1)
Sort Key: orders.o_orderkey
-> Merge Join (cost=670885.68..688278.21
rows=454194 width=41) (actual time=34469.359..42050.059 rows=455262 loops=1)
Merge Cond: (customer.c_custkey =
orders.o_custkey)
-> Sort (cost=59105.79..59856.10
rows=300123 width=41) (actual time=8113.826..8491.532 rows=299493 loops=1)
Sort Key: customer.c_custkey
-> Nested Loop
(cost=781.13..31801.81 rows=300123 width=41) (actual
time=107.537..7461.355 rows=299493 loops=1)
-> Nested Loop
(cost=1.06..11.00 rows=5 width=33) (actual time=0.030..0.296 rows=5 loops=1)
Join Filter:
(nation.n_regionkey = region.r_regionkey)
-> Index Scan using
pk_nation on nation (cost=0.00..9.38 rows=25 width=37) (actual
time=0.007..0.063 rows=25 loops=1)
-> Materialize
(cost=1.06..1.07 rows=1 width=4) (actual time=0.002..0.004 rows=1 loops=25)
-> Seq Scan on
region (cost=0.00..1.06 rows=1 width=4) (actual time=0.009..0.018
rows=1 loops=1)
Filter:
(r_name = 'AFRICA'::bpchar)
-> Bitmap Heap Scan on
customer (cost=780.07..5607.85 rows=60025 width=8) (actual
time=61.150..1331.466 rows=59899 loops=5)
Recheck Cond:
(nation.n_nationkey = customer.c_nationkey)
-> Bitmap Index Scan
on i_c_nationkey (cost=0.00..780.07 rows=60025 width=0) (actual
time=44.637..44.637 rows=59899 loops=5)
Index Cond:
(nation.n_nationkey = customer.c_nationkey)
-> Sort (cost=611779.89..617457.31
rows=2270971 width=8) (actual time=26355.515..29471.963 rows=2276859
loops=1)
Sort Key: orders.o_custkey
-> Bitmap Heap Scan on orders
(cost=41391.94..372023.51 rows=2270971 width=8) (actual
time=1630.604..16266.102 rows=2276859 loops=1)
Recheck Cond: ((o_orderdate

= '1993-01-01'::date) AND (o_orderdate < '1994-01-01

00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on
i_o_orderdate (cost=0.00..41391.94 rows=2270971 width=0) (actual
time=1352.037..1352.037 rows=2276859 loops=1)
Index Cond:
((o_orderdate >= '1993-01-01'::date) AND (o_orderdate < '1994-01-01
00:00:00'::timestamp without time zone))
-> Index Scan using pk_supplier on supplier
(cost=0.00..0.03 rows=1 width=8) (actual time=0.010..0.012 rows=1
loops=1822547)
Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)
Total runtime: 286984.386 ms
(34 rows)

(about 120s seem to be explain analyze overhead here)

fwiw the box in question is a Dual 2,6Ghz Opteron with 8GB or RAM - wal
is on the BBWC-onboard Smartarray (RAID 10 on 4 disks) and the data is
on a 14 disk Linux Software RAID 10 running Debian Sarge/AMD64 with
Kernel 2.6.17.7. the dbt3 database got initialized with scaling factor
of 10 (running with just 1 works fine).

Stefan