BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

Started by PG Bug reporting formabout 5 years ago8 messagesbugs

noreply@postgresql.org

about 5 years ago

The following bug has been logged on the website:

Bug reference: 16885
Logged by: Tavin Cole
Email address: tavincole+pgbg@gmail.com
PostgreSQL version: 13.2
Operating system: Amazon Linux 2
Description:

We're trying to upgrade from the version 9 series and have a deal-breaking
slow query, which runs okay in 10 and 11 but is many times slower in 12+.
I've tested across minor versions from 12.1 through 12.6, and the minor
version doesn't affect it. I've also found the same behavior in 13.2.

The problem lies in the planner choosing a nested loop join instead of the
hash join it should use. Please see this stackoverflow post for the explain
analyze query plans illustrating the difference:
https://stackoverflow.com/questions/66248669/postgresql-upgrade-to-12-changes-hash-join-to-slow-nested-loop

I note the nested loop join filter removes some 60 million rows. This is 60x
the row count of the tables involved.

Postgres is installed from the official pgdg rpm. The upgrade in our test
environment is done with pg_upgrade. Then a full analyze is done before
testing the query. I've tried setting default_statistics_target as high as
500 with no improvement. In fact the query only runs fast before the
statistics are gathered.

Thanks for your attention!

Magnus Hagander

magnus@hagander.net

about 5 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

On Mon, Feb 22, 2021 at 11:46 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 16885
Logged by: Tavin Cole
Email address: tavincole+pgbg@gmail.com
PostgreSQL version: 13.2
Operating system: Amazon Linux 2
Description:

We're trying to upgrade from the version 9 series and have a deal-breaking
slow query, which runs okay in 10 and 11 but is many times slower in 12+.
I've tested across minor versions from 12.1 through 12.6, and the minor
version doesn't affect it. I've also found the same behavior in 13.2.

The problem lies in the planner choosing a nested loop join instead of the
hash join it should use. Please see this stackoverflow post for the explain
analyze query plans illustrating the difference:
https://stackoverflow.com/questions/66248669/postgresql-upgrade-to-12-changes-hash-join-to-slow-nested-loop

I note the nested loop join filter removes some 60 million rows. This is 60x
the row count of the tables involved.

Postgres is installed from the official pgdg rpm. The upgrade in our test
environment is done with pg_upgrade. Then a full analyze is done before
testing the query. I've tried setting default_statistics_target as high as
500 with no improvement. In fact the query only runs fast before the
statistics are gathered.

Thanks for your attention!

You'll have to post your actual query to get proper feedback.

That said, a typical case for this specifically when going to version
12 would be that CTEs are no longer optimization barriers. If you do
use CTEs in your query, try changing them from "WITH x AS (...)" to
"WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
see if that improves things.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Tavin Cole

tavin.cole@gmail.com

about 5 years ago

In reply to: Magnus Hagander (#2)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

On Mon, Feb 22, 2021 at 11:56 AM Magnus Hagander <magnus@hagander.net>
wrote:

You'll have to post your actual query to get proper feedback.

That said, a typical case for this specifically when going to version
12 would be that CTEs are no longer optimization barriers. If you do
use CTEs in your query, try changing them from "WITH x AS (...)" to
"WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
see if that improves things.

I don't know if I will be allowed to publicly post the actual query. I can
say it's quite complicated, and involves the following features:

* an inner SELECT DISTINCT from a particular primary table and some joins
* a join to the same primary table in the outer query, among others
* a join to a view in the outer query
* said view uses CTEs like you say, though based on unrelated tables
* then said view makes a UNION of 2 different SELECTs
* each of these again joins to the same primary table

So I'm not surprised the query planner gets confused. And your trick works.
Materializing those CTEs makes the query planner choose the hash join again.

I still think there is a bug here though. The query planner is getting
something really wrong, and at least now we know it's a consequence of
trying to optimize through the CTEs.

I will see what I can do about sharing the actual query. If this is
interesting to a pg developer, you're welcome to contact me.

Best,
Tavin

Pavel Stehule

pavel.stehule@gmail.com

about 5 years ago

In reply to: Tavin Cole (#3)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

po 22. 2. 2021 v 15:16 odesílatel Tavin Cole <tavin.cole@gmail.com> napsal:

On Mon, Feb 22, 2021 at 11:56 AM Magnus Hagander <magnus@hagander.net>
wrote:

You'll have to post your actual query to get proper feedback.

That said, a typical case for this specifically when going to version
12 would be that CTEs are no longer optimization barriers. If you do
use CTEs in your query, try changing them from "WITH x AS (...)" to
"WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
see if that improves things.

I don't know if I will be allowed to publicly post the actual query. I can
say it's quite complicated, and involves the following features:

* an inner SELECT DISTINCT from a particular primary table and some joins
* a join to the same primary table in the outer query, among others
* a join to a view in the outer query
* said view uses CTEs like you say, though based on unrelated tables
* then said view makes a UNION of 2 different SELECTs
* each of these again joins to the same primary table

So I'm not surprised the query planner gets confused. And your trick
works. Materializing those CTEs makes the query planner choose the hash
join again.

I still think there is a bug here though. The query planner is getting
something really wrong, and at least now we know it's a consequence of
trying to optimize through the CTEs.

I will see what I can do about sharing the actual query. If this is
interesting to a pg developer, you're welcome to contact me.

The optimizer is controlled by estimations. With bad estimations, then the
result of the planner is random. Sometimes it is better, sometimes it is
worse. Materialized CTE works as an optimization barrier. In these cases it
can help - and in others the using materialized CTE can be worse.

you don't need share query - just send an execution plan -
https://explain.depesz.com/ can do anonymization of the execution plan, if
it is necessary.

Regards

Pavel

Show quoted text

Best,
Tavin

Tavin Cole

tavin.cole@gmail.com

about 5 years ago

In reply to: Pavel Stehule (#4)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

On Mon, Feb 22, 2021 at 3:21 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

you don't need share query - just send an execution plan -
https://explain.depesz.com/ can do anonymization of the execution plan,
if it is necessary.

v11 plan: https://explain.depesz.com/s/P5Jj
v12 plan: https://explain.depesz.com/s/KPr

Cheers,
Tavin

Pavel Stehule

pavel.stehule@gmail.com

about 5 years ago

In reply to: Tavin Cole (#5)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

po 22. 2. 2021 v 15:25 odesílatel Tavin Cole <tavin.cole@gmail.com> napsal:

On Mon, Feb 22, 2021 at 3:21 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

you don't need share query - just send an execution plan -
https://explain.depesz.com/ can do anonymization of the execution plan,
if it is necessary.

v11 plan: https://explain.depesz.com/s/P5Jj
v12 plan: https://explain.depesz.com/s/KPr

You can see lot of underestimations and overestimations

Hash Join
<http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-join>
(cost=15,018.84..89,129.05
rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056
loops=1)

But probably the significant problem is underestimation of aggregation

HashAggregate
<http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-aggregate>
(cost=275,174.24..278,605.86
rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462
loops=57)

The estimated cost is less than in real cost, and then the cost of a nested
loop can be cheaper.

Regards

Pavel

Show quoted text

Cheers,
Tavin

Tavin Cole

tavin.cole@gmail.com

about 5 years ago

In reply to: Pavel Stehule (#6)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

On Mon, Feb 22, 2021 at 4:37 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

You can see lot of underestimations and overestimations

Hash Join
<http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-join> (cost=15,018.84..89,129.05
rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056
loops=1)

But probably the significant problem is underestimation of aggregation

HashAggregate
<http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-aggregate> (cost=275,174.24..278,605.86
rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462
loops=57)

The estimated cost is less than in real cost, and then the cost of a
nested loop can be cheaper.

Yes this must be the union in the view I mentioned. Both halves of the
union select from the same table with different where clauses, in such a
way that together virtually all of the 1+ million rows are returned. But it
seems the planner estimates that a majority of the rows will be filtered
out before applying the union.

Now I see that the planner makes the same miscalculation in the v11 plan,
but at a point in the execution plan where it is harmless.

It seems postgres was never able to successfully estimate this query, but
in the past we got lucky with the optimization barrier created by the CTEs.

I wonder if this is a situation which the planner might be able to
successfully estimate in the future, with some improvements, or if it is
just impossible because the SQL is unoptimized?

Thanks all for the help.

Best,
Tavin

Pavel Stehule

pavel.stehule@gmail.com

about 5 years ago

In reply to: Tavin Cole (#7)

Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

po 22. 2. 2021 v 20:31 odesílatel Tavin Cole <tavin.cole@gmail.com> napsal:

On Mon, Feb 22, 2021 at 4:37 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

You can see lot of underestimations and overestimations

Hash Join
<http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-join> (cost=15,018.84..89,129.05
rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056
loops=1)

But probably the significant problem is underestimation of aggregation

HashAggregate
<http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-aggregate> (cost=275,174.24..278,605.86
rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462
loops=57)

The estimated cost is less than in real cost, and then the cost of a
nested loop can be cheaper.

Yes this must be the union in the view I mentioned. Both halves of the
union select from the same table with different where clauses, in such a
way that together virtually all of the 1+ million rows are returned. But it
seems the planner estimates that a majority of the rows will be filtered
out before applying the union.

Now I see that the planner makes the same miscalculation in the v11 plan,
but at a point in the execution plan where it is harmless.

It seems postgres was never able to successfully estimate this query, but
in the past we got lucky with the optimization barrier created by the CTEs.

I wonder if this is a situation which the planner might be able to
successfully estimate in the future, with some improvements, or if it is
just impossible because the SQL is unoptimized?

It is a difficult question - some estimation errors can be fixed - some
not. It depends on data, data models (some data models are unhappy
designed). Sometimes it is better to break a query to some subqueries,
store results to temp tables, do analysis over these temp tables, and then
continue the query. This usually fix estimation errors well, but there is
an overhead of temp tables.

Tomas Vondra is working on expression statistics, and I can say so
estimation has made big progress in the last three years. But the precision
of any estimation will be limited.

Show quoted text

Thanks all for the help.

Best,
Tavin