Question of Parallel Hash Join on TPC-H Benchmark

Started by Ba Jinshengover 1 year ago12 messagesbugs
Jump to latest
#1Ba Jinsheng
bajinsheng@u.nus.edu

Hi everyone,

Considering the query 2 in TPC-H benchmark:

explain analyze
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from PART, SUPPLIER, PARTSUPP, NATION, REGION where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and ps_supplycost = ( select min(ps_supplycost) from PARTSUPP, SUPPLIER, NATION, REGION where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;

I run it on PostgreSQL and got this query plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=66839.16..66839.17 rows=1 width=195) (actual time=267.054..267.114 rows=100 loops=1)
-> Sort (cost=66839.16..66839.17 rows=1 width=195) (actual time=267.053..267.108 rows=100 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
Sort Method: top-N heapsort Memory: 71kB
-> Hash Join (cost=37874.96..66839.15 rows=1 width=195) (actual time=227.273..266.816 rows=485 loops=1)
Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
-> Gather (cost=1000.00..6458.40 rows=804 width=30) (actual time=0.567..0.733 rows=826 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on part (cost=0.00..5378.00 rows=335 width=30) (actual time=0.156..15.960 rows=275 loops=3)
Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))
Rows Removed by Filter: 66391
-> Hash (cost=30567.96..30567.96 rows=160000 width=175) (actual time=225.816..225.820 rows=160240 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 4648kB
-> Hash Join (cost=407.96..30567.96 rows=160000 width=175) (actual time=5.200..162.555 rows=160240 loops=1)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Seq Scan on partsupp (cost=0.00..25560.00 rows=800000 width=14) (actual time=0.016..59.835 rows=800000 loops=1)
-> Hash (cost=382.96..382.96 rows=2000 width=169) (actual time=5.169..5.172 rows=2003 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 413kB
-> Hash Join (cost=2.46..382.96 rows=2000 width=169) (actual time=0.108..4.336 rows=2003 loops=1)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier (cost=0.00..323.00 rows=10000 width=144) (actual time=0.013..2.478 rows=10000 loops=1)
-> Hash (cost=2.40..2.40 rows=5 width=33) (actual time=0.069..0.071 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.07..2.40 rows=5 width=33) (actual time=0.060..0.066 rows=5 loops=1)
Hash Cond: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=37) (actual time=0.011..0.013 rows=25 loops=1)
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.029..0.029 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
SubPlan 1
-> Aggregate (cost=48.70..48.71 rows=1 width=32) (actual time=0.019..0.019 rows=1 loops=1311)
-> Nested Loop (cost=0.85..48.70 rows=1 width=6) (actual time=0.014..0.018 rows=1 loops=1311)
Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)
Rows Removed by Join Filter: 3
-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1311)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.010..0.017 rows=4 loops=1311)
-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.009..0.013 rows=4 loops=1311)
-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.007..0.008 rows=4 loops=1311)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)
Index Cond: (n_nationkey = supplier_1.s_nationkey)
(50 rows)

I am wondering why the HASH JOIN in the fifth line is not in parallel?
According to this document https://www.postgresql.org/docs/current/parallel-safety.html, I understand it should be in parallel, and we can get a more efficient query plan as follows:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=42019.25..42019.25 rows=1 width=195) (actual time=134.698..142.258 rows=100 loops=1)
-> Sort (cost=42019.25..42019.25 rows=1 width=195) (actual time=134.697..142.252 rows=100 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
Sort Method: top-N heapsort Memory: 69kB
-> Gather (cost=26845.97..42019.24 rows=1 width=195) (actual time=116.843..142.014 rows=485 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=25845.97..41019.14 rows=1 width=195) (actual time=120.427..130.569 rows=162 loops=3)
Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
-> Parallel Seq Scan on part (cost=0.00..5378.00 rows=335 width=30) (actual time=0.102..9.739 rows=275 loops=3)
Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))
Rows Removed by Filter: 66391
-> Parallel Hash (cost=23217.97..23217.97 rows=66667 width=175) (actual time=97.833..97.837 rows=53413 loops=3)
Buckets: 65536 Batches: 8 Memory Usage: 4800kB
-> Hash Join (cost=407.96..23217.97 rows=66667 width=175) (actual time=4.596..69.280 rows=53413 loops=3)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Parallel Seq Scan on partsupp (cost=0.00..20893.33 rows=333333 width=14) (actual time=0.039..26.690 rows=266667 loops=3)
-> Hash (cost=382.96..382.96 rows=2000 width=169) (actual time=4.543..4.545 rows=2003 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 413kB
-> Hash Join (cost=2.46..382.96 rows=2000 width=169) (actual time=0.106..3.658 rows=2003 loops=3)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier (cost=0.00..323.00 rows=10000 width=144) (actual time=0.015..1.071 rows=10000 loops=3)
-> Hash (cost=2.40..2.40 rows=5 width=33) (actual time=0.070..0.072 rows=5 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.07..2.40 rows=5 width=33) (actual time=0.061..0.067 rows=5 loops=3)
Hash Cond: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=37) (actual time=0.009..0.011 rows=25 loops=3)
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.014..0.016 rows=1 loops=3)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
SubPlan 1
-> Aggregate (cost=48.70..48.71 rows=1 width=32) (actual time=0.017..0.017 rows=1 loops=3810)
-> Nested Loop (cost=0.85..48.70 rows=1 width=6) (actual time=0.012..0.016 rows=1 loops=3810)
Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)
Rows Removed by Join Filter: 3
-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3810)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.006..0.015 rows=4 loops=3810)
-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.005..0.011 rows=4 loops=3810)
-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.003..0.004 rows=4 loops=3810)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=15240)
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=15240)
Index Cond: (n_nationkey = supplier_1.s_nationkey)
(50 rows)

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

#2Ba Jinsheng
bajinsheng@u.nus.edu
In reply to: Ba Jinsheng (#1)
Re: Question of Parallel Hash Join on TPC-H Benchmark

Forgot to mention:

TPC-H data scale: 1 GB
PostgreSQL version: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a94d5b3728dd3e5fd4adb25350712eb785bb9ab6

________________________________
From: Ba Jinsheng <bajinsheng@u.nus.edu>
Sent: Thursday, October 10, 2024 4:11 PM
To: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Question of Parallel Hash Join on TPC-H Benchmark

- External Email -

Hi everyone,

Considering the query 2 in TPC-H benchmark:

explain analyze
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from PART, SUPPLIER, PARTSUPP, NATION, REGION where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and ps_supplycost = ( select min(ps_supplycost) from PARTSUPP, SUPPLIER, NATION, REGION where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;

I run it on PostgreSQL and got this query plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=66839.16..66839.17 rows=1 width=195) (actual time=267.054..267.114 rows=100 loops=1)
-> Sort (cost=66839.16..66839.17 rows=1 width=195) (actual time=267.053..267.108 rows=100 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
Sort Method: top-N heapsort Memory: 71kB
-> Hash Join (cost=37874.96..66839.15 rows=1 width=195) (actual time=227.273..266.816 rows=485 loops=1)
Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
-> Gather (cost=1000.00..6458.40 rows=804 width=30) (actual time=0.567..0.733 rows=826 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on part (cost=0.00..5378.00 rows=335 width=30) (actual time=0.156..15.960 rows=275 loops=3)
Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))
Rows Removed by Filter: 66391
-> Hash (cost=30567.96..30567.96 rows=160000 width=175) (actual time=225.816..225.820 rows=160240 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 4648kB
-> Hash Join (cost=407.96..30567.96 rows=160000 width=175) (actual time=5.200..162.555 rows=160240 loops=1)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Seq Scan on partsupp (cost=0.00..25560.00 rows=800000 width=14) (actual time=0.016..59.835 rows=800000 loops=1)
-> Hash (cost=382.96..382.96 rows=2000 width=169) (actual time=5.169..5.172 rows=2003 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 413kB
-> Hash Join (cost=2.46..382.96 rows=2000 width=169) (actual time=0.108..4.336 rows=2003 loops=1)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier (cost=0.00..323.00 rows=10000 width=144) (actual time=0.013..2.478 rows=10000 loops=1)
-> Hash (cost=2.40..2.40 rows=5 width=33) (actual time=0.069..0.071 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.07..2.40 rows=5 width=33) (actual time=0.060..0.066 rows=5 loops=1)
Hash Cond: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=37) (actual time=0.011..0.013 rows=25 loops=1)
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.029..0.029 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
SubPlan 1
-> Aggregate (cost=48.70..48.71 rows=1 width=32) (actual time=0.019..0.019 rows=1 loops=1311)
-> Nested Loop (cost=0.85..48.70 rows=1 width=6) (actual time=0.014..0.018 rows=1 loops=1311)
Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)
Rows Removed by Join Filter: 3
-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1311)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.010..0.017 rows=4 loops=1311)
-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.009..0.013 rows=4 loops=1311)
-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.007..0.008 rows=4 loops=1311)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)
Index Cond: (n_nationkey = supplier_1.s_nationkey)
(50 rows)

I am wondering why the HASH JOIN in the fifth line is not in parallel?
According to this document https://www.postgresql.org/docs/current/parallel-safety.html, I understand it should be in parallel, and we can get a more efficient query plan as follows:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=42019.25..42019.25 rows=1 width=195) (actual time=134.698..142.258 rows=100 loops=1)
-> Sort (cost=42019.25..42019.25 rows=1 width=195) (actual time=134.697..142.252 rows=100 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
Sort Method: top-N heapsort Memory: 69kB
-> Gather (cost=26845.97..42019.24 rows=1 width=195) (actual time=116.843..142.014 rows=485 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=25845.97..41019.14 rows=1 width=195) (actual time=120.427..130.569 rows=162 loops=3)
Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
-> Parallel Seq Scan on part (cost=0.00..5378.00 rows=335 width=30) (actual time=0.102..9.739 rows=275 loops=3)
Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))
Rows Removed by Filter: 66391
-> Parallel Hash (cost=23217.97..23217.97 rows=66667 width=175) (actual time=97.833..97.837 rows=53413 loops=3)
Buckets: 65536 Batches: 8 Memory Usage: 4800kB
-> Hash Join (cost=407.96..23217.97 rows=66667 width=175) (actual time=4.596..69.280 rows=53413 loops=3)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Parallel Seq Scan on partsupp (cost=0.00..20893.33 rows=333333 width=14) (actual time=0.039..26.690 rows=266667 loops=3)
-> Hash (cost=382.96..382.96 rows=2000 width=169) (actual time=4.543..4.545 rows=2003 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 413kB
-> Hash Join (cost=2.46..382.96 rows=2000 width=169) (actual time=0.106..3.658 rows=2003 loops=3)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier (cost=0.00..323.00 rows=10000 width=144) (actual time=0.015..1.071 rows=10000 loops=3)
-> Hash (cost=2.40..2.40 rows=5 width=33) (actual time=0.070..0.072 rows=5 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.07..2.40 rows=5 width=33) (actual time=0.061..0.067 rows=5 loops=3)
Hash Cond: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=37) (actual time=0.009..0.011 rows=25 loops=3)
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.014..0.016 rows=1 loops=3)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
SubPlan 1
-> Aggregate (cost=48.70..48.71 rows=1 width=32) (actual time=0.017..0.017 rows=1 loops=3810)
-> Nested Loop (cost=0.85..48.70 rows=1 width=6) (actual time=0.012..0.016 rows=1 loops=3810)
Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)
Rows Removed by Join Filter: 3
-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3810)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.006..0.015 rows=4 loops=3810)
-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.005..0.011 rows=4 loops=3810)
-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.003..0.004 rows=4 loops=3810)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=15240)
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=15240)
Index Cond: (n_nationkey = supplier_1.s_nationkey)
(50 rows)

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender. Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.
Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ba Jinsheng (#2)
Re: Question of Parallel Hash Join on TPC-H Benchmark

The planner likely does not believe it'd be a win. It's not just about
parallel safety - starting parallel workers and copying the data between
processes is not free, and the planner tries to not do parallel stuff
unnecessarily. I'd bet 1GB is fairly small for parallelism.

You can try setting parallel_setup_cost and parallel_tuple_cost to 0,
and reducing min_parallel_{table,index}_scan_size. That should force a
parallel plan, and you'll see if parallelism is helpful.

regards

On 10/10/24 16:16, Ba Jinsheng wrote:

Forgot to mention:

TPC-H data scale: 1 GB
PostgreSQL version: https://git.postgresql.org/gitweb/?
p=postgresql.git;a=commit;h=a94d5b3728dd3e5fd4adb25350712eb785bb9ab6
<https://git.postgresql.org/gitweb/?
p=postgresql.git;a=commit;h=a94d5b3728dd3e5fd4adb25350712eb785bb9ab6>

------------------------------------------------------------------------
*From:* Ba Jinsheng <bajinsheng@u.nus.edu>
*Sent:* Thursday, October 10, 2024 4:11 PM
*To:* pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
*Subject:* Question of Parallel Hash Join on TPC-H Benchmark
 
 

- External Email -

 

Hi everyone,

Considering the query 2 in TPC-H benchmark:

explain analyze
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone,
s_comment from PART, SUPPLIER, PARTSUPP, NATION, REGION where p_partkey
= ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like
'%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and
r_name = 'ASIA' and ps_supplycost = ( select min(ps_supplycost) from
PARTSUPP, SUPPLIER, NATION, REGION where p_partkey = ps_partkey and
s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey =
r_regionkey and r_name = 'ASIA' ) order by s_acctbal desc, n_name,
s_name, p_partkey limit 100;

I run it  on PostgreSQL and got this query plan:

                                                                       
             QUERY PLAN                                                
                                   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=66839.16..66839.17 rows=1 width=195) (actual
time=267.054..267.114 rows=100 loops=1)
   ->  Sort  (cost=66839.16..66839.17 rows=1 width=195) (actual
time=267.053..267.108 rows=100 loops=1)
         Sort Key: supplier.s_acctbal DESC, nation.n_name,
supplier.s_name, part.p_partkey
         Sort Method: top-N heapsort  Memory: 71kB
         *->  Hash Join  (cost=37874.96..66839.15 rows=1 width=195)
(actual time=227.273..266.816 rows=485 loops=1)*
               Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND
((SubPlan 1) = partsupp.ps_supplycost))
               ->  Gather  (cost=1000.00..6458.40 rows=804 width=30)
(actual time=0.567..0.733 rows=826 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Parallel Seq Scan on part  (cost=0.00..5378.00
rows=335 width=30) (actual time=0.156..15.960 rows=275 loops=3)
                           Filter: (((p_type)::text ~~ '%STEEL'::text)
AND (p_size = 30))
                           Rows Removed by Filter: 66391
               ->  Hash  (cost=30567.96..30567.96 rows=160000 width=175)
(actual time=225.816..225.820 rows=160240 loops=1)
                     Buckets: 65536  Batches: 8  Memory Usage: 4648kB
                     ->  Hash Join  (cost=407.96..30567.96 rows=160000
width=175) (actual time=5.200..162.555 rows=160240 loops=1)
                           Hash Cond: (partsupp.ps_suppkey =
supplier.s_suppkey)
                           ->  Seq Scan on partsupp
 (cost=0.00..25560.00 rows=800000 width=14) (actual time=0.016..59.835
rows=800000 loops=1)
                           ->  Hash  (cost=382.96..382.96 rows=2000
width=169) (actual time=5.169..5.172 rows=2003 loops=1)
                                 Buckets: 2048  Batches: 1  Memory
Usage: 413kB
                                 ->  Hash Join  (cost=2.46..382.96
rows=2000 width=169) (actual time=0.108..4.336 rows=2003 loops=1)
                                       Hash Cond: (supplier.s_nationkey
= nation.n_nationkey)
                                       ->  Seq Scan on supplier
 (cost=0.00..323.00 rows=10000 width=144) (actual time=0.013..2.478
rows=10000 loops=1)
                                       ->  Hash  (cost=2.40..2.40 rows=5
width=33) (actual time=0.069..0.071 rows=5 loops=1)
                                             Buckets: 1024  Batches: 1
 Memory Usage: 9kB
                                             ->  Hash Join
 (cost=1.07..2.40 rows=5 width=33) (actual time=0.060..0.066 rows=5 loops=1)
                                                   Hash Cond:
(nation.n_regionkey = region.r_regionkey)
                                                   ->  Seq Scan on
nation  (cost=0.00..1.25 rows=25 width=37) (actual time=0.011..0.013
rows=25 loops=1)
                                                   ->  Hash
 (cost=1.06..1.06 rows=1 width=4) (actual time=0.029..0.029 rows=1 loops=1)
                                                         Buckets: 1024
 Batches: 1  Memory Usage: 9kB
                                                         ->  Seq Scan on
region  (cost=0.00..1.06 rows=1 width=4) (actual time=0.014..0.015
rows=1 loops=1)
                                                               Filter:
(r_name = 'ASIA'::bpchar)
                                                               Rows
Removed by Filter: 4
               SubPlan 1
                 ->  Aggregate  (cost=48.70..48.71 rows=1 width=32)
(actual time=0.019..0.019 rows=1 loops=1311)
                       ->  Nested Loop  (cost=0.85..48.70 rows=1
width=6) (actual time=0.014..0.018 rows=1 loops=1311)
                             Join Filter: (region_1.r_regionkey =
nation_1.n_regionkey)
                             Rows Removed by Join Filter: 3
                             ->  Seq Scan on region region_1
 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1
loops=1311)
                                   Filter: (r_name = 'ASIA'::bpchar)
                                   Rows Removed by Filter: 4
                             ->  Nested Loop  (cost=0.85..47.58 rows=4
width=10) (actual time=0.010..0.017 rows=4 loops=1311)
                                   ->  Nested Loop  (cost=0.71..46.96
rows=4 width=10) (actual time=0.009..0.013 rows=4 loops=1311)
                                         ->  Index Scan using
partsupp_pkey on partsupp partsupp_1  (cost=0.42..13.75 rows=4 width=10)
(actual time=0.007..0.008 rows=4 loops=1311)
                                               Index Cond: (ps_partkey =
part.p_partkey)
                                         ->  Index Scan using
supplier_pkey on supplier supplier_1  (cost=0.29..8.30 rows=1 width=8)
(actual time=0.001..0.001 rows=1 loops=5244)
                                               Index Cond: (s_suppkey =
partsupp_1.ps_suppkey)
                                   ->  Index Scan using nation_pkey on
nation nation_1  (cost=0.14..0.16 rows=1 width=8) (actual
time=0.001..0.001 rows=1 loops=5244)
                                         Index Cond: (n_nationkey =
supplier_1.s_nationkey)
(50 rows)

I am wondering why the HASH JOIN in the fifth line is not in parallel?
According to this document https://www.postgresql.org/docs/current/
parallel-safety.html <https://www.postgresql.org/docs/current/parallel-
safety.html>, I understand it should be in parallel, and we can get a
more efficient query plan as follows:

                                                                       
                QUERY PLAN                                              
                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=42019.25..42019.25 rows=1 width=195) (actual
time=134.698..142.258 rows=100 loops=1)
   ->  Sort  (cost=42019.25..42019.25 rows=1 width=195) (actual
time=134.697..142.252 rows=100 loops=1)
         Sort Key: supplier.s_acctbal DESC, nation.n_name,
supplier.s_name, part.p_partkey
         Sort Method: top-N heapsort  Memory: 69kB
         ->  Gather  (cost=26845.97..42019.24 rows=1 width=195) (actual
time=116.843..142.014 rows=485 loops=1)
               Workers Planned: 2
               Workers Launched: 2
          *     ->  Parallel Hash Join  (cost=25845.97..41019.14 rows=1
width=195) (actual time=120.427..130.569 rows=162 loops=3)*
                     Hash Cond: ((part.p_partkey = partsupp.ps_partkey)
AND ((SubPlan 1) = partsupp.ps_supplycost))
                     ->  Parallel Seq Scan on part  (cost=0.00..5378.00
rows=335 width=30) (actual time=0.102..9.739 rows=275 loops=3)
                           Filter: (((p_type)::text ~~ '%STEEL'::text)
AND (p_size = 30))
                           Rows Removed by Filter: 66391
                     ->  Parallel Hash  (cost=23217.97..23217.97
rows=66667 width=175) (actual time=97.833..97.837 rows=53413 loops=3)
                           Buckets: 65536  Batches: 8  Memory Usage: 4800kB
                           ->  Hash Join  (cost=407.96..23217.97
rows=66667 width=175) (actual time=4.596..69.280 rows=53413 loops=3)
                                 Hash Cond: (partsupp.ps_suppkey =
supplier.s_suppkey)
                                 ->  Parallel Seq Scan on partsupp
 (cost=0.00..20893.33 rows=333333 width=14) (actual time=0.039..26.690
rows=266667 loops=3)
                                 ->  Hash  (cost=382.96..382.96
rows=2000 width=169) (actual time=4.543..4.545 rows=2003 loops=3)
                                       Buckets: 2048  Batches: 1  Memory
Usage: 413kB
                                       ->  Hash Join  (cost=2.46..382.96
rows=2000 width=169) (actual time=0.106..3.658 rows=2003 loops=3)
                                             Hash Cond:
(supplier.s_nationkey = nation.n_nationkey)
                                             ->  Seq Scan on supplier
 (cost=0.00..323.00 rows=10000 width=144) (actual time=0.015..1.071
rows=10000 loops=3)
                                             ->  Hash  (cost=2.40..2.40
rows=5 width=33) (actual time=0.070..0.072 rows=5 loops=3)
                                                   Buckets: 1024
 Batches: 1  Memory Usage: 9kB
                                                   ->  Hash Join
 (cost=1.07..2.40 rows=5 width=33) (actual time=0.061..0.067 rows=5 loops=3)
                                                         Hash Cond:
(nation.n_regionkey = region.r_regionkey)
                                                         ->  Seq Scan on
nation  (cost=0.00..1.25 rows=25 width=37) (actual time=0.009..0.011
rows=25 loops=3)
                                                         ->  Hash
 (cost=1.06..1.06 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=3)
                                                               Buckets:
1024  Batches: 1  Memory Usage: 9kB
                                                               ->  Seq
Scan on region  (cost=0.00..1.06 rows=1 width=4) (actual
time=0.014..0.016 rows=1 loops=3)
                                                                   
 Filter: (r_name = 'ASIA'::bpchar)
                                                                   
 Rows Removed by Filter: 4
                     SubPlan 1
                       ->  Aggregate  (cost=48.70..48.71 rows=1
width=32) (actual time=0.017..0.017 rows=1 loops=3810)
                             ->  Nested Loop  (cost=0.85..48.70 rows=1
width=6) (actual time=0.012..0.016 rows=1 loops=3810)
                                   Join Filter: (region_1.r_regionkey =
nation_1.n_regionkey)
                                   Rows Removed by Join Filter: 3
                                   ->  Seq Scan on region region_1
 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1
loops=3810)
                                         Filter: (r_name = 'ASIA'::bpchar)
                                         Rows Removed by Filter: 4
                                   ->  Nested Loop  (cost=0.85..47.58
rows=4 width=10) (actual time=0.006..0.015 rows=4 loops=3810)
                                         ->  Nested Loop
 (cost=0.71..46.96 rows=4 width=10) (actual time=0.005..0.011 rows=4
loops=3810)
                                               ->  Index Scan using
partsupp_pkey on partsupp partsupp_1  (cost=0.42..13.75 rows=4 width=10)
(actual time=0.003..0.004 rows=4 loops=3810)
                                                     Index Cond:
(ps_partkey = part.p_partkey)
                                               ->  Index Scan using
supplier_pkey on supplier supplier_1  (cost=0.29..8.30 rows=1 width=8)
(actual time=0.001..0.001 rows=1 loops=15240)
                                                     Index Cond:
(s_suppkey = partsupp_1.ps_suppkey)
                                         ->  Index Scan using
nation_pkey on nation nation_1  (cost=0.14..0.16 rows=1 width=8) (actual
time=0.001..0.001 rows=1 loops=15240)
                                               Index Cond: (n_nationkey
= supplier_1.s_nationkey)
(50 rows)

Best regards,

Jinsheng Ba

 

Notice: This email is generated from the account of an NUS alumnus.
Contents, views, and opinions therein are solely those of the sender.
Notice: This email is generated from the account of an NUS alumnus.
Contents, views, and opinions therein are solely those of the sender.
Notice: This email is generated from the account of an NUS alumnus.
Contents, views, and opinions therein are solely those of the sender.

--
Tomas Vondra

#4Zhang Mingli
zmlpostgres@gmail.com
In reply to: Tomas Vondra (#3)
Re: Question of Parallel Hash Join on TPC-H Benchmark

Hi,

Zhang Mingli
www.hashdata.xyz
On Oct 10, 2024 at 22:41 +0800, Tomas Vondra <tomas@vondra.me>, wrote:

You can try setting parallel_setup_cost and parallel_tuple_cost to 0,
and reducing min_parallel_{table,index}_scan_size. That should force a
parallel plan, and you'll see if parallelism is helpful.

In addition to this, it seems you’r eager a parallel-aware HashJoin.
Check if  enable_parallel_hash = on, an explain(verbose) will show related GUCs.

#5Ba Jinsheng
bajinsheng@u.nus.edu
In reply to: Zhang Mingli (#4)
Re: Question of Parallel Hash Join on TPC-H Benchmark

Thanks for your replies!

I configured these variables and did get a worse query plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=57899.45..57899.45 rows=1 width=195) (actual time=4261.283..4261.806 rows=100 loops=1)
-> Sort (cost=57899.45..57899.45 rows=1 width=195) (actual time=4261.281..4261.795 rows=100 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
Sort Method: top-N heapsort Memory: 71kB
-> Hash Join (cost=29025.99..57899.44 rows=1 width=195) (actual time=281.174..4261.118 rows=485 loops=1)
Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
-> Gather (cost=0.00..5378.00 rows=804 width=30) (actual time=0.607..0.807 rows=826 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on part (cost=0.00..5378.00 rows=335 width=30) (actual time=0.156..17.662 rows=275 loops=3)
Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))
Rows Removed by Filter: 66391
-> Hash (cost=22718.99..22718.99 rows=160000 width=175) (actual time=108.521..108.833 rows=160240 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 4648kB
-> Gather (cost=297.99..22718.99 rows=160000 width=175) (actual time=5.148..53.343 rows=160240 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=297.99..22718.99 rows=66667 width=175) (actual time=2.076..71.222 rows=53413 loops=3)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Parallel Seq Scan on partsupp (cost=0.00..20893.33 rows=333333 width=14) (actual time=0.036..28.325 rows=266667 loops=3)
-> Parallel Hash (cost=287.58..287.58 rows=833 width=169) (actual time=1.600..1.602 rows=668 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 464kB
-> Parallel Hash Join (cost=2.28..287.58 rows=833 width=169) (actual time=0.047..1.870 rows=1002 loops=2)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Parallel Seq Scan on supplier (cost=0.00..264.67 rows=4167 width=144) (actual time=0.004..0.518 rows=5000 loops=2)
-> Parallel Hash (cost=2.25..2.25 rows=3 width=33) (actual time=0.020..0.023 rows=2 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Parallel Hash Join (cost=1.05..2.25 rows=3 width=33) (actual time=0.028..0.040 rows=5 loops=1)
Hash Cond: (nation.n_regionkey = region.r_regionkey)
-> Parallel Seq Scan on nation (cost=0.00..1.15 rows=15 width=37) (actual time=0.002..0.004 rows=25 loops=1)
-> Parallel Hash (cost=1.04..1.04 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Parallel Seq Scan on region (cost=0.00..1.04 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=1)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
SubPlan 1
-> Aggregate (cost=48.67..48.68 rows=1 width=32) (actual time=3.144..3.144 rows=1 loops=1311)
-> Nested Loop (cost=0.85..48.67 rows=1 width=6) (actual time=1.306..3.143 rows=1 loops=1311)
Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)
Rows Removed by Join Filter: 3
-> Gather (cost=0.00..1.04 rows=1 width=4) (actual time=0.647..3.118 rows=1 loops=1311)
Workers Planned: 1
Workers Launched: 1
-> Parallel Seq Scan on region region_1 (cost=0.00..1.04 rows=1 width=4) (actual time=0.001..0.002 rows=0 loops=2622)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 2
-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.010..0.024 rows=4 loops=1311)
-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.008..0.017 rows=4 loops=1311)
-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.005..0.006 rows=4 loops=1311)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=5244)
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)
Index Cond: (n_nationkey = supplier_1.s_nationkey)
Planning Time: 10.079 ms
Execution Time: 4262.062 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

However, I found that the Hash Join in the fifth line is still not in parallel, and other Hash Join are in parallel. This is not what I intended.

I tried another thing. I reset every configuration to default and apply the following patch:

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4..a0dc032d79 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -803,8 +803,6 @@ max_parallel_hazard_test(char proparallel, max_parallel_hazard_context *context)
                        Assert(context->max_hazard != PROPARALLEL_UNSAFE);
                        context->max_hazard = proparallel;
                        /* done if we are not expecting any unsafe functions */
-                       if (context->max_interesting == proparallel)
-                               return true;
                        break;
                case PROPARALLEL_UNSAFE:
                        context->max_hazard = proparallel;

Then I can get a more efficient query plan. Compared to the original execution under default configurations, the estimated cost is reduced around 37% and the execution time is reduced around 50%.

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=42019.25..42019.25 rows=1 width=195) (actual time=134.698..142.258 rows=100 loops=1)
-> Sort (cost=42019.25..42019.25 rows=1 width=195) (actual time=134.697..142.252 rows=100 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
Sort Method: top-N heapsort Memory: 69kB
-> Gather (cost=26845.97..42019.24 rows=1 width=195) (actual time=116.843..142.014 rows=485 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=25845.97..41019.14 rows=1 width=195) (actual time=120.427..130.569 rows=162 loops=3)
Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
-> Parallel Seq Scan on part (cost=0.00..5378.00 rows=335 width=30) (actual time=0.102..9.739 rows=275 loops=3)
Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))
Rows Removed by Filter: 66391
-> Parallel Hash (cost=23217.97..23217.97 rows=66667 width=175) (actual time=97.833..97.837 rows=53413 loops=3)
Buckets: 65536 Batches: 8 Memory Usage: 4800kB
-> Hash Join (cost=407.96..23217.97 rows=66667 width=175) (actual time=4.596..69.280 rows=53413 loops=3)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Parallel Seq Scan on partsupp (cost=0.00..20893.33 rows=333333 width=14) (actual time=0.039..26.690 rows=266667 loops=3)
-> Hash (cost=382.96..382.96 rows=2000 width=169) (actual time=4.543..4.545 rows=2003 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 413kB
-> Hash Join (cost=2.46..382.96 rows=2000 width=169) (actual time=0.106..3.658 rows=2003 loops=3)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier (cost=0.00..323.00 rows=10000 width=144) (actual time=0.015..1.071 rows=10000 loops=3)
-> Hash (cost=2.40..2.40 rows=5 width=33) (actual time=0.070..0.072 rows=5 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.07..2.40 rows=5 width=33) (actual time=0.061..0.067 rows=5 loops=3)
Hash Cond: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=37) (actual time=0.009..0.011 rows=25 loops=3)
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.014..0.016 rows=1 loops=3)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
SubPlan 1
-> Aggregate (cost=48.70..48.71 rows=1 width=32) (actual time=0.017..0.017 rows=1 loops=3810)
-> Nested Loop (cost=0.85..48.70 rows=1 width=6) (actual time=0.012..0.016 rows=1 loops=3810)
Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)
Rows Removed by Join Filter: 3
-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3810)
Filter: (r_name = 'ASIA'::bpchar)
Rows Removed by Filter: 4
-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.006..0.015 rows=4 loops=3810)
-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.005..0.011 rows=4 loops=3810)
-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.003..0.004 rows=4 loops=3810)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=15240)
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=15240)
Index Cond: (n_nationkey = supplier_1.s_nationkey)
Planning Time: 2.046 ms
Execution Time: 142.437 ms
(50 rows)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Best regards,

Jinsheng Ba

________________________________
From: Zhang Mingli <zmlpostgres@gmail.com>
Sent: Thursday, October 10, 2024 4:47 PM
To: Ba Jinsheng <bajinsheng@u.nus.edu>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Tomas Vondra <tomas@vondra.me>
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark

- External Email -

Hi,

Zhang Mingli
www.hashdata.xyz
On Oct 10, 2024 at 22:41 +0800, Tomas Vondra <tomas@vondra.me>, wrote:

You can try setting parallel_setup_cost and parallel_tuple_cost to 0,
and reducing min_parallel_{table,index}_scan_size. That should force a
parallel plan, and you'll see if parallelism is helpful.
In addition to this, it seems you’r eager a parallel-aware HashJoin.
Check if enable_parallel_hash = on, an explain(verbose) will show related GUCs.
Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

#6Andrei Lepikhov
lepihov@gmail.com
In reply to: Ba Jinsheng (#5)
Re: Question of Parallel Hash Join on TPC-H Benchmark

On 10/10/2024 23:52, Ba Jinsheng wrote:

Thanks for your replies!

I configured these variables and did get a worse query plan:

Could you provide SQL dump and settings to play with this case locally?
I would discover what we need to transform Subplan 1 into a SEMI JOIN.

Also, I usually force parallel workers with settings like below:

max_parallel_workers_per_gather = 32
min_parallel_table_scan_size = 0
min_parallel_index_scan_size = 0
max_worker_processes = 64
parallel_setup_cost = 0.001
parallel_tuple_cost = 0.0001

just to be sure, it's not the case when something down the plan tree
forbids generating a parallel query plan.

P.S.
To discover other options, the AQO extension may be helpful. It stores
cardinalities from all plan nodes after execution and uses them to
calculate selectivities on the subsequent execution.

--
regards, Andrei Lepikhov

#7Ba Jinsheng
bajinsheng@u.nus.edu
In reply to: Andrei Lepikhov (#6)
Re: Question of Parallel Hash Join on TPC-H Benchmark

Could you provide SQL dump and settings to play with this case locally?

The dump file is too big, so I put it on Google Drive: https://drive.google.com/file/d/1e0s6ZLKLEPbZzS6BzftwpmVspWi7Okd1/view?usp=sharing

I also share my data directory here: https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing

Also, I usually force parallel workers with settings like below:

max_parallel_workers_per_gather = 32
min_parallel_table_scan_size = 0
min_parallel_index_scan_size = 0
max_worker_processes = 64
parallel_setup_cost = 0.001
parallel_tuple_cost = 0.0001

I tried these configuration parameters and got the same worse query plan--- the HashJoin in fifth line is still not in parallel and the following HashJoin are in parallel.
However, this is an inefficient query plan.

I changed the code to generate an efficient query plan (only the HashJoin in fifth line is in parallel), so I am wondering whether it is possible to optimize the code to enable this efficient query plan in default? I believe at least, it would improve the performance of PostgreSQL on the standard benchmark TPC-H.
If you need, I can provide my environment in docker for your analysis.

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: Ba Jinsheng (#7)
Re: Question of Parallel Hash Join on TPC-H Benchmark

On 13/10/2024 01:11, Ba Jinsheng wrote:

I changed the code to generate an efficient query plan (only the
HashJoin in fifth line is in parallel), so I am wondering whether it is
possible to optimize the code to enable this efficient query plan in
default? I believe at least, it would improve the performance of
PostgreSQL on the standard benchmark TPC-H.
If you need, I can provide my environment in docker for your analysis.

Thanks for sharing the case! I will investigate it in a while.

--
regards, Andrei Lepikhov

#9Andrei Lepikhov
lepihov@gmail.com
In reply to: Ba Jinsheng (#7)
Re: Question of Parallel Hash Join on TPC-H Benchmark

On 13/10/2024 01:11, Ba Jinsheng wrote:

I changed the code to generate an efficient query plan (only the
HashJoin in fifth line is in parallel), so I am wondering whether it is
possible to optimize the code to enable this efficient query plan in
default? I believe at least, it would improve the performance of
PostgreSQL on the standard benchmark TPC-H.
If you need, I can provide my environment in docker for your analysis.

Thanks for the case! It was a pretty exciting case.

The answer to your question is simple: we have a correlated subquery
here with the clause: 'p_partkey = ps_partkey'. Because of that, the top
JOIN operator has parameterised inner, and parallel workers can't be
used according to the current state of the code. See comments in the code:

/*
* If the inner path is parameterised, we can't use a partial hashjoin.
* Parameterised partial paths are not supported. The caller should
* already have verified that no lateral rels are required here.
*/

Even if you transform the subquery to SEMI JOIN, you will have a
parameterised join, and the parallel plan will be declined.
So, the best you can do here is replace clause
'ps_supplycost = (SELECT ...)' with 'ps_supplycost IN (SELECT ...)'
I got quite a beneficial speedup there (see attached).

So, what can we improve here?
- I am suspicious about the parallel plans for parameterised paths, at
least soon.
- Improve pull-ups for subqueries. Especially if we can prove that the
subquery has a single aggregate returning only one tuple. It looks doable.

--
regards, Andrei Lepikhov

Attachments:

final_explain.txttext/plain; charset=UTF-8; name=final_explain.txtDownload
#10Ba Jinsheng
bajinsheng@u.nus.edu
In reply to: Andrei Lepikhov (#9)
Re: Question of Parallel Hash Join on TPC-H Benchmark

Thanks for looking into it!
We are working a research project to automatically look for such cases.
If it matters for PostgreSQL, I will report more such cases.

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

#11Andrei Lepikhov
lepihov@gmail.com
In reply to: Ba Jinsheng (#10)
Re: Question of Parallel Hash Join on TPC-H Benchmark

On 13/10/2024 15:57, Ba Jinsheng wrote:

Thanks for looking into it!
We are working a research project to automatically look for such cases.
If it matters for PostgreSQL, I will report more such cases.

I made an attempt last year [1]https://github.com/danolivo/pg_track_optimizer. However, at least in production, it
provided too many false-positive cases to trigger automatic
re-optimisation [2]https://postgrespro.com/docs/enterprise/16/realtime-query-replanning on each poorly estimated query plan.
I wonder if you have any progress in that area. Living nearby Singapore
(Chon Buri, Thailand) I would like to visit meetup related to this
problem, if you organised it.
Of course, reporting performance issues is always beneficial.

[1]: https://github.com/danolivo/pg_track_optimizer
[2]: https://postgrespro.com/docs/enterprise/16/realtime-query-replanning

--
regards, Andrei Lepikhov

#12Ba Jinsheng
bajinsheng@u.nus.edu
In reply to: Andrei Lepikhov (#11)
Re: Question of Parallel Hash Join on TPC-H Benchmark

I wonder if you have any progress in that area. Living nearby Singapore

(Chon Buri, Thailand) I would like to visit meetup related to this
problem, if you organised it.

That is great! Unfortunately, I have moved to ETH Zurich, Switzerland as a postdoc from last month.
Our research is still on early exploration stage, and we would happy to talk more about it when we have made a significant progress.

Of course, reporting performance issues is always beneficial.

Sounds good! I will report more cases when I believe are real performance issues.

Best regards,

Jinsheng Ba

________________________________
From: Andrei Lepikhov <lepihov@gmail.com>
Sent: Sunday, October 13, 2024 11:57 AM
To: Ba Jinsheng <bajinsheng@u.nus.edu>
Cc: Tomas Vondra <tomas@vondra.me>; Zhang Mingli <zmlpostgres@gmail.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark

- External Email -

On 13/10/2024 15:57, Ba Jinsheng wrote:

Thanks for looking into it!
We are working a research project to automatically look for such cases.
If it matters for PostgreSQL, I will report more such cases.

I made an attempt last year [1]https://github.com/danolivo/pg_track_optimizer. However, at least in production, it
provided too many false-positive cases to trigger automatic
re-optimisation [2]https://postgrespro.com/docs/enterprise/16/realtime-query-replanning on each poorly estimated query plan.
I wonder if you have any progress in that area. Living nearby Singapore
(Chon Buri, Thailand) I would like to visit meetup related to this
problem, if you organised it.
Of course, reporting performance issues is always beneficial.

[1]: https://github.com/danolivo/pg_track_optimizer
[2]: https://postgrespro.com/docs/enterprise/16/realtime-query-replanning

--
regards, Andrei Lepikhov

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.