hash joins are causing no space left error

Started by Ayub Mover 5 years ago5 messagesgeneral
Jump to latest
#1Ayub M
hiayub@gmail.com

This is PostgreSQL 11 on AWS, there is a mview query in this OLAP database,
the tables involved are huge - 50-100m records on average records hundreds
of columns in most cases. The query runs for a while and then errors out
saying "No space left on device". I could see it generating around 500gb of
temp file data. At times it goes thru and at times it fails - probably due
to other queries running at the same time and causing failure.

The tables are partitioned and indexed on the PKs and FKs. Using
parallelism (4) with increased work_mem (4gb).

The joins are happening on around 10 tables and all are joining on the PK
and FK columns. I see partition pruning happening but the hash joins are
killing the query.

Is there any way to avoid hash joins? If we create hash indexes on the
joining columns, would PostgreSQL avoid hashing operation and instead use
hash indexes on the tables and join them. That way I feel resource
intensive hashing would be avoided and there wont be any need of temp
files. I tried but does not seem to work, when I query the table with
specific values then it uses the hash index but when I am joining the
tables it seems to do its own hash join.

My question is how to optimize massive table joins in PostgreSQL to resolve
- avoid space failures and make it run fast - takes a couple of hours to
complete now. Any best practices or suggestions.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ayub M (#1)
Re: hash joins are causing no space left error

Ayub M <hiayub@gmail.com> writes:

This is PostgreSQL 11 on AWS, there is a mview query in this OLAP database,
the tables involved are huge - 50-100m records on average records hundreds
of columns in most cases. The query runs for a while and then errors out
saying "No space left on device". I could see it generating around 500gb of
temp file data. At times it goes thru and at times it fails - probably due
to other queries running at the same time and causing failure.

Are you sure that these queries are actually producing the answers you
want? It sounds suspiciously like you are computing underconstrained
joins.

The joins are happening on around 10 tables and all are joining on the PK
and FK columns. I see partition pruning happening but the hash joins are
killing the query.
Is there any way to avoid hash joins?

TBH, you are asking the wrong question. A merge join would take about as
much temporary space, and a nestloop join over so much data would probably
not finish in an amount of time you're willing to wait. Indexes are NOT
a magic solution here. What you need to be thinking about is how to not
need to process so much data.

If you really need to have this proven to you, you can try "set
enable_hashjoin = off", but I don't think you'll find that better.

regards, tom lane

#3Michael Lewis
mlewis@entrata.com
In reply to: Ayub M (#1)
Re: hash joins are causing no space left error

On Wed, Aug 12, 2020 at 5:52 PM Ayub M <hiayub@gmail.com> wrote:

This is PostgreSQL 11 on AWS, there is a mview query in this OLAP
database, the tables involved are huge - 50-100m records on average records
hundreds of columns in most cases.

How many tables and how many partitions each? Can you share an EXPLAIN
output? Are the tables being joined partitioned in such a way to allow
partition wise joins? Have you enabled partition wise joins config? There
are many enhancements for partitioning in PG12, do you have the option to
upgrade?

#4Ayub M
hiayub@gmail.com
In reply to: Michael Lewis (#3)
Re: hash joins are causing no space left error

Michael, below is the query and the execution plan. Yes the tables are
partitioned and its using parallel options. Yes I could do the upgrade if I
can show the benefits, please check the explain plan and let me know what
pg12 features you are referring to which could help.

SELECT
...
250+ cols from various below tables
...FROM
x.table1 yankee_charlieINNER JOIN x.juliet_juliet juliet_alpha ON
yankee_charlie.bravo_tango = juliet_alpha.kilo_fourLEFT OUTER
JOIN y.sierra_delta kilo_foxtrot ON
yankee_charlie.sierra_six = kilo_foxtrot.november_julietLEFT
OUTER JOIN z.xray_bravo uniform_delta ON
yankee_charlie.alpha_four = uniform_delta.papa_mikeLEFT OUTER
JOIN x.papa_whiskey india_five ON
yankee_charlie.golf = india_five.tango_mikeLEFT OUTER JOIN
x.lima_romeo hotel ON
yankee_charlie.zulu_oscar = hotel.bravo_hotelLEFT OUTER JOIN
x.romeo_golf foxtrot_whiskey_two ON
yankee_charlie.xray_alpha =
foxtrot_whiskey_two.tango_quebecLEFT OUTER JOIN a.seven_yankee
zulu_four ON
yankee_charlie.uniform_india = zulu_four.seven_bravoLEFT OUTER
JOIN a.quebec_november kilo_lima ON
zulu_four.mike_four = kilo_lima.lima_uniformLEFT OUTER JOIN
a.tango_romeo romeo_xray_echo ON
kilo_lima.lima_uniform = romeo_xray_echo.lima_uniformLEFT
OUTER JOIN b.seven_three four_hotel ON
kilo_lima.zulu_three = four_hotel.whiskey_victor_bravoLEFT
OUTER JOIN x.juliet_yankee five_quebec ON
yankee_charlie.oscar_india = five_quebec.six_xrayLEFT OUTER
JOIN z.romeo_two delta_mike ON
yankee_charlie.four_zulu = delta_mike.whiskey_victor_sevenLEFT
OUTER JOIN z.delta_lima six_alpha ON
yankee_charlie.xray_three = six_alpha.kilo_whiskeyLEFT OUTER
JOIN x.five_hotel xray_quebec ON
yankee_charlie.bravo_tango = xray_quebec.kilo_fourLEFT OUTER
JOIN y.sierra_delta mike_foxtrot ON
yankee_charlie.two = mike_foxtrot.november_julietLEFT OUTER
JOIN y.sierra_delta india_three ON
yankee_charlie.victor = india_three.november_julietWHERE
yankee_charlie.romeo_xray_two >= (CURRENT_DATE - INTERVAL '5 years')
AND yankee_charlie.romeo_xray_two <
papa_five('year',(CURRENT_DATE + INTERVAL '1 year')) - INTERVAL '1
day';

Gather (cost=33464846.41..475412138.09 rows=97965031 width=7161)
Workers Planned: 2
-> Parallel Hash Left Join (cost=33463846.41..465614634.99
rows=40818763 width=7161)
Hash Cond: (yankee_charlie.victor = india_three.november_juliet)
-> Parallel Hash Left Join (cost=33330811.86..392519286.24
rows=40818763 width=7109)
Hash Cond: (yankee_charlie.two = mike_foxtrot.november_juliet)
-> Hash Left Join (cost=33197777.31..321716804.91
rows=40818763 width=7056)
Hash Cond: (yankee_charlie.xray_three =
six_alpha.kilo_whiskey)
-> Hash Left Join
(cost=33197713.71..321608781.15 rows=40818763 width=7003)
Hash Cond: (yankee_charlie.four_zulu =
delta_mike.whiskey_victor_seven)
-> Hash Left Join
(cost=33197035.05..321500899.07 rows=40818763 width=6863)
Hash Cond:
(yankee_charlie.oscar_india = five_quebec.six_xray)
-> Parallel Hash Left Join
(cost=33196883.64..321393345.56 rows=40818763 width=6813)
Hash Cond:
(yankee_charlie.bravo_tango = xray_quebec.kilo_four)
-> Parallel Hash Left Join
(cost=29850433.79..255866124.43 rows=40818763 width=6125)
Hash Cond:
(zulu_four.mike_four = kilo_lima.lima_uniform)
-> Hash Left Join
(cost=27572665.00..192250116.73 rows=40818763 width=6070)
Hash Cond:
(yankee_charlie.zulu_oscar = hotel.bravo_hotel)
-> Parallel Hash
Left Join (cost=27571519.35..192141780.40 rows=40818763 width=6042)
Hash Cond:
(yankee_charlie.alpha_four = uniform_delta.papa_mike)
-> Parallel
Hash Join (cost=27569303.10..192032398.49 rows=40818763 width=5775)
Hash
Cond: (yankee_charlie.bravo_tango = foxtrot_whiskey_bravo2.kilo_four)
->
Parallel Hash Left Join (cost=3696445.91..128666530.60 rows=40818763
width=2497)

Hash Cond: (yankee_charlie.sierra_six = kilo_foxtrot.november_juliet)
->
Parallel Hash Left Join (cost=3550202.36..106708533.27 rows=40818763
width=2147)

Hash Cond: (yankee_charlie.xray_alpha =
foxtrot_whiskey_two.tango_quebec)

-> Parallel Hash Left Join (cost=1366012.90..84660500.52
rows=40818763 width=1926)

Hash Cond: (yankee_charlie.uniform_india =
zulu_four.seven_bravo)

-> Parallel Hash Left Join (cost=3031.30..65010702.64
rows=40818763 width=1781)

Hash Cond: (yankee_charlie.golf =
india_five.tango_mike)

-> Parallel Append (cost=0.12..64900513.56
rows=40818780 width=1835)

Subplans Removed: 25

-> Parallel Index Scan using november_mike on
quebec_victor yankee_charlie (cost=0.12..8.15 rows=1 width=10798)

Filter: ((romeo_xray_two >= (CURRENT_DATE
- 'uniform_tango'::interval)) AND (romeo_xray_two <
(papa_five('quebec_four'::text, (CURRENT_DATE +
'foxtrot_juliet'::interval)) - 'seven_oscar'::interval)))

-> Parallel Index Scan using alpha_six on
xray_foxtrot romeo_oscar (cost=0.12..8.15 rows=1 width=10798)

Filter: ((romeo_xray_two >= (CURRENT_DATE
- 'uniform_tango'::interval)) AND (romeo_xray_two <
(papa_five('quebec_four'::text, (CURRENT_DATE +
'foxtrot_juliet'::interval)) - 'seven_oscar'::interval)))

-> Parallel Index Scan using whiskey_three on
yankee_bravo charlie (cost=0.12..8.15 rows=1 width=10798)

Filter: ((romeo_xray_two >= (CURRENT_DATE
- 'uniform_tango'::interval)) AND (romeo_xray_two <
(papa_five('quebec_four'::text, (CURRENT_DATE +
'foxtrot_juliet'::interval)) - 'seven_oscar'::interval)))

-> Parallel Index Scan using quebec_india on
five_yankee quebec_foxtrot (cost=0.12..8.15 rows=1 width=10798)

Filter: ((romeo_xray_two >= (CURRENT_DATE
- 'uniform_tango'::interval)) AND (romeo_xray_two <
(papa_five('quebec_four'::text, (CURRENT_DATE +
'foxtrot_juliet'::interval)) - 'seven_oscar'::interval)))

-> Parallel Seq Scan on romeo_zulu oscar_seven
(cost=0.00..48358699.80 rows=30351009 width=1975)

Filter: ((romeo_xray_two >= (CURRENT_DATE
- 'uniform_tango'::interval)) AND (romeo_xray_two <
(papa_five('quebec_four'::text, (CURRENT_DATE +
'foxtrot_juliet'::interval)) - 'seven_oscar'::interval)))

-> Parallel Seq Scan on xray_seven five_golf
(cost=0.00..16337483.57 rows=10467742 width=1428)

Filter: ((romeo_xray_two >= (CURRENT_DATE
- 'uniform_tango'::interval)) AND (romeo_xray_two <
(papa_five('quebec_four'::text, (CURRENT_DATE +
'foxtrot_juliet'::interval)) - 'seven_oscar'::interval)))

-> Parallel Hash (cost=2643.30..2643.30 rows=31030
width=76)

-> Parallel Seq Scan on papa_whiskey india_five
(cost=0.00..2643.30 rows=31030 width=76)

-> Parallel Hash (cost=1131848.93..1131848.93 rows=5529893
width=210)

-> Parallel Seq Scan on seven_yankee zulu_four
(cost=0.00..1131848.93 rows=5529893 width=210)

-> Parallel Hash (cost=1797395.54..1797395.54 rows=6623354
width=351)

-> Parallel Seq Scan on romeo_golf foxtrot_whiskey_two
(cost=0.00..1797395.54 rows=6623354 width=351)
->
Parallel Hash (cost=122037.80..122037.80 rows=365580 width=415)

-> Parallel Seq Scan on sierra_delta kilo_foxtrot
(cost=0.00..122037.80 rows=365580 width=415)
->
Parallel Hash (cost=10995540.28..10995540.28 rows=30397353
width=3343)
->
Parallel Append (cost=0.00..10995540.28 rows=30397353 width=3343)

-> Parallel Seq Scan on kilo_november lima_five
(cost=0.00..8362872.25 rows=22396025 width=3681)

-> Parallel Seq Scan on five_hotel foxtrot_oscar
(cost=0.00..2480681.27 rows=8001327 width=2398)

-> Parallel Seq Scan on juliet_juliet juliet_alpha
(cost=0.00..0.00 rows=1 width=7566)
-> Parallel
Hash (cost=2012.78..2012.78 rows=16278 width=332)
->
Parallel Seq Scan on xray_bravo uniform_delta (cost=0.00..2012.78
rows=16278 width=332)
-> Hash
(cost=1008.62..1008.62 rows=10962 width=158)
-> Seq Scan
on lima_romeo hotel (cost=0.00..1008.62 rows=10962 width=158)
-> Parallel Hash
(cost=2112348.26..2112348.26 rows=4256042 width=185)
-> Hash Left Join
(cost=671798.94..2112348.26 rows=4256042 width=185)
Hash Cond:
(kilo_lima.zulu_three = four_hotel.whiskey_victor_bravo)
-> Parallel
Hash Left Join (cost=670990.36..2100358.89 rows=4256042 width=188)
Hash
Cond: (kilo_lima.lima_uniform = romeo_xray_echo.lima_uniform)
->
Parallel Seq Scan on quebec_november kilo_lima (cost=0.00..1156375.42
rows=4256042 width=176)
->
Parallel Hash (cost=563811.27..563811.27 rows=4253927 width=77)
->
Parallel Seq Scan on tango_romeo romeo_xray_echo
(cost=0.00..563811.27 rows=4253927 width=77)
-> Hash
(cost=740.48..740.48 rows=5448 width=127)
-> Seq
Scan on seven_three four_hotel (cost=0.00..740.48 rows=5448
width=127)
-> Parallel Hash
(cost=2480681.27..2480681.27 rows=8001327 width=753)
-> Parallel Seq Scan on
five_hotel xray_quebec (cost=0.00..2480681.27 rows=8001327 width=753)
-> Hash (cost=128.96..128.96
rows=1796 width=180)
-> Seq Scan on juliet_yankee
five_quebec (cost=0.00..128.96 rows=1796 width=180)
-> Hash (cost=573.85..573.85 rows=8385 width=270)
-> Seq Scan on romeo_two delta_mike
(cost=0.00..573.85 rows=8385 width=270)
-> Hash (cost=56.60..56.60 rows=560 width=183)
-> Seq Scan on delta_lima six_alpha
(cost=0.00..56.60 rows=560 width=183)
-> Parallel Hash (cost=122037.80..122037.80
rows=365580 width=118)
-> Parallel Seq Scan on sierra_delta mike_foxtrot
(cost=0.00..122037.80 rows=365580 width=118)
-> Parallel Hash (cost=122037.80..122037.80 rows=365580 width=118)
-> Parallel Seq Scan on sierra_delta india_three
(cost=0.00..122037.80 rows=365580 width=118)

On Thu, Aug 13, 2020 at 1:10 PM Michael Lewis <mlewis@entrata.com> wrote:

On Wed, Aug 12, 2020 at 5:52 PM Ayub M <hiayub@gmail.com> wrote:

This is PostgreSQL 11 on AWS, there is a mview query in this OLAP
database, the tables involved are huge - 50-100m records on average records
hundreds of columns in most cases.

How many tables and how many partitions each? Can you share an EXPLAIN
output? Are the tables being joined partitioned in such a way to allow
partition wise joins? Have you enabled partition wise joins config? There
are many enhancements for partitioning in PG12, do you have the option to
upgrade?

--
Regards,
Ayub

#5Ayub M
hiayub@gmail.com
In reply to: Tom Lane (#2)
Re: hash joins are causing no space left error

On Wed, Aug 12, 2020 at 8:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ayub M <hiayub@gmail.com> writes:

This is PostgreSQL 11 on AWS, there is a mview query in this OLAP

database,

the tables involved are huge - 50-100m records on average records

hundreds

of columns in most cases. The query runs for a while and then errors out
saying "No space left on device". I could see it generating around 500gb

of

temp file data. At times it goes thru and at times it fails - probably

due

to other queries running at the same time and causing failure.

Are you sure that these queries are actually producing the answers you
want? It sounds suspiciously like you are computing underconstrained
joins.
--> Yes, it is as per the business requirement.

The joins are happening on around 10 tables and all are joining on the PK
and FK columns. I see partition pruning happening but the hash joins are
killing the query.
Is there any way to avoid hash joins?

TBH, you are asking the wrong question. A merge join would take about as
much temporary space, and a nestloop join over so much data would probably
not finish in an amount of time you're willing to wait. Indexes are NOT
a magic solution here. What you need to be thinking about is how to not
need to process so much data.

If you really need to have this proven to you, you can try "set
enable_hashjoin = off", but I don't think you'll find that better.
--> You are right, neither merge join nor nested loop are resolving the
issue.

regards, tom lane

--
Regards,
Ayub