Incorrect cost for MergeAppend

Started by Alexander Kuzmenkovabout 2 years ago17 messages

akuzmenkov@timescale.com

about 2 years ago

Hello hackers,

While investigating some query plans, I noticed some code that seems
to be wrong: when create_merge_append_path() estimates the cost of
sorting an input, it calls cost_sort() passing subpath->parent->tuples
as the number of tuples. Shouldn't it use subpath->parent->rows or
even subpath->rows instead? The `tuples` variable doesn't account for
the filters on the relation, so this leads to incorrect cost estimates
when there are highly selective filters, and Sort + Append is chosen
instead of MergeAppend.

I don't have a good repro yet because the plans I've been studying
involve a third-party extension, but the code looks incorrect so I
thought I should ask.

Here is a link to this code on GitHub:
https://github.com/postgres/postgres/blob/6a1ea02c491d16474a6214603dce40b5b122d4d1/src/backend/optimizer/util/pathnode.c#L1469-L1477

---
Alexander Kuzmenkov
Timescale

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: Alexander Kuzmenkov (#1)

Re: Incorrect cost for MergeAppend

On Mon, Jan 29, 2024 at 6:11 PM Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:

Hello hackers,

While investigating some query plans, I noticed some code that seems
to be wrong: when create_merge_append_path() estimates the cost of
sorting an input, it calls cost_sort() passing subpath->parent->tuples
as the number of tuples. Shouldn't it use subpath->parent->rows or
even subpath->rows instead? The `tuples` variable doesn't account for
the filters on the relation, so this leads to incorrect cost estimates
when there are highly selective filters, and Sort + Append is chosen
instead of MergeAppend.

All other callers of cost_sort() except plan_cluster_use_sort() are
using rows instead of tuples. Even plan_cluster_use_sort() has
rel->rows = rel->tuples, it's actually passing rows. So agree with
your suggestion. However a test will be good since this code is quite
old.

--
Best Wishes,
Ashutosh Bapat

Alexander Kuzmenkov

akuzmenkov@timescale.com

about 2 years ago

In reply to: Ashutosh Bapat (#2)

Re: Incorrect cost for MergeAppend

Here is a small patch that reproduces the problem on two tables with
inheritance, and fixes it. I'll add it to the Commitfest.

On Tue, Jan 30, 2024 at 8:20 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Show quoted text

On Mon, Jan 29, 2024 at 6:11 PM Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:

Hello hackers,

While investigating some query plans, I noticed some code that seems
to be wrong: when create_merge_append_path() estimates the cost of
sorting an input, it calls cost_sort() passing subpath->parent->tuples
as the number of tuples. Shouldn't it use subpath->parent->rows or
even subpath->rows instead? The `tuples` variable doesn't account for
the filters on the relation, so this leads to incorrect cost estimates
when there are highly selective filters, and Sort + Append is chosen
instead of MergeAppend.

All other callers of cost_sort() except plan_cluster_use_sort() are
using rows instead of tuples. Even plan_cluster_use_sort() has
rel->rows = rel->tuples, it's actually passing rows. So agree with
your suggestion. However a test will be good since this code is quite
old.

--
Best Wishes,
Ashutosh Bapat

Aleksander Alekseev

aleksander@timescale.com

about 2 years ago

In reply to: Alexander Kuzmenkov (#3)

Re: Incorrect cost for MergeAppend

Hi,

Here is a small patch that reproduces the problem on two tables with
inheritance, and fixes it. I'll add it to the Commitfest.

Thanks for the patch.

I can confirm that it changes the plan from Sort + Append to MergeAppend.

Before:

```
explain (costs off) select * from ma0 where a < 1000 order by a;
QUERY PLAN
---------------------------------------------------------
Sort
Sort Key: ma0.a
-> Append
-> Index Only Scan using ma0_pkey on ma0 ma0_1
Index Cond: (a < 1000)
-> Seq Scan on ma1 ma0_2
Filter: (a < 1000)
```

After:

```
=# explain (costs off) select * from ma0 where a < 1000 order by a;
QUERY PLAN
---------------------------------------------------
Merge Append
Sort Key: ma0.a
-> Index Only Scan using ma0_pkey on ma0 ma0_1
Index Cond: (a < 1000)
-> Sort
Sort Key: ma0_2.a
-> Seq Scan on ma1 ma0_2
Filter: (a < 1000)
```

The rest of the tests pass.

--
Best regards,
Aleksander Alekseev

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Alexander Kuzmenkov (#3)

Re: Incorrect cost for MergeAppend

On Wed, 31 Jan 2024 at 00:06, Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:

Here is a small patch that reproduces the problem on two tables with
inheritance, and fixes it. I'll add it to the Commitfest.

Good catch.

It seems to have been broken since MergeAppends were added in 11cad29c9.

The code fix looks good to me.

The tests look a bit heavy. I wondered if you could have used a UNION
ALL query with the tenk1 table so you didn't have to create tables,
however, I see that case works ok since the parent->tuples is set in
set_subquery_size_estimates() from "rel->tuples =
sub_final_rel->cheapest_total_path->rows;".

I played around with the following and see your patch changes the plan
to a Merge Append away from an Append -> Sort plan.

create table tenk_parent (like tenk1);
alter table tenk1 inherit tenk_parent;
alter table tenk2 inherit tenk_parent;
explain (costs off) select * from tenk_parent where thousand = 0 order
by tenthous;
alter table tenk1 no inherit tenk_parent;
alter table tenk2 no inherit tenk_parent;

I'm just not sure if we should be messing with tenk1 and tenk2 like that.

You should likely focus on trying to find a test that does not require
making 2 tables with 10k rows.

David

Alexander Kuzmenkov

akuzmenkov@timescale.com

about 2 years ago

In reply to: David Rowley (#5)

Re: Incorrect cost for MergeAppend

On Tue, Jan 30, 2024 at 1:20 PM David Rowley <dgrowleyml@gmail.com> wrote:

You should likely focus on trying to find a test that does not require
making 2 tables with 10k rows.

Is 1k smallint OK? It should fit in one page. Still reproduces the
error, and the entire test case runs in under 10 ms.

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Alexander Kuzmenkov (#6)

Re: Incorrect cost for MergeAppend

On Wed, 31 Jan 2024 at 02:23, Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:

On Tue, Jan 30, 2024 at 1:20 PM David Rowley <dgrowleyml@gmail.com> wrote:

You should likely focus on trying to find a test that does not require
making 2 tables with 10k rows.

Is 1k smallint OK? It should fit in one page. Still reproduces the
error, and the entire test case runs in under 10 ms.

I had a go at making it a bit smaller without going dangerously close
to where the plan might change. The following seems to work.

create table ma0(a int primary key);
create table ma1() inherits (ma0);
insert into ma0 select generate_series(1, 400);
insert into ma1 select generate_series(1, 200);
analyze ma0;
analyze ma1;

explain (costs off) select * from ma0 where a < 100 order by a;

drop table ma0 cascade;

As for backpatching this. It does risk plans changing in stable
versions of PostgreSQL, which we normally try to avoid. However, it
seems quite similar to 1e731ed12a, albeit, far more long-standing.
I'm currently thinking we should backpatch this back to 12.

Does anyone disagree?

David

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: David Rowley (#7)

Re: Incorrect cost for MergeAppend

On Wed, Jan 31, 2024 at 4:33 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 31 Jan 2024 at 02:23, Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:

On Tue, Jan 30, 2024 at 1:20 PM David Rowley <dgrowleyml@gmail.com> wrote:

You should likely focus on trying to find a test that does not require
making 2 tables with 10k rows.

Is 1k smallint OK? It should fit in one page. Still reproduces the
error, and the entire test case runs in under 10 ms.

I had a go at making it a bit smaller without going dangerously close
to where the plan might change. The following seems to work.

create table ma0(a int primary key);
create table ma1() inherits (ma0);
insert into ma0 select generate_series(1, 400);
insert into ma1 select generate_series(1, 200);
analyze ma0;
analyze ma1;

explain (costs off) select * from ma0 where a < 100 order by a;

drop table ma0 cascade;

On my laptop with all default settings

with patch
postgres@231714=#explain select * from ma0 where a < 100 order by a;
QUERY PLAN
--------------------------------------------------------------------------------------
Merge Append (cost=6.94..18.90 rows=198 width=4)
Sort Key: ma0.a
-> Index Only Scan using ma0_pkey on ma0 ma0_1 (cost=0.15..9.88
rows=99 width=4)
Index Cond: (a < 100)
-> Sort (cost=6.78..7.03 rows=99 width=4)
Sort Key: ma0_2.a
-> Seq Scan on ma1 ma0_2 (cost=0.00..3.50 rows=99 width=4)
Filter: (a < 100)
(8 rows)

without patch
#explain select * from ma0 where a < 100 order by a;
QUERY PLAN
----------------------------------------------------------------------
Sort (cost=19.04..19.54 rows=198 width=4)
Sort Key: ma0.a
-> Append (cost=0.00..11.49 rows=198 width=4)
-> Seq Scan on ma0 ma0_1 (cost=0.00..7.00 rows=99 width=4)
Filter: (a < 100)
-> Seq Scan on ma1 ma0_2 (cost=0.00..3.50 rows=99 width=4)
Filter: (a < 100)
(7 rows)

postgres@233864=#select (19.54 - 18.90)/19.54, (19.04 - 18.09)/19.04;
?column? | ?column?
------------------------+------------------------
0.03275332650972364381 | 0.04989495798319327731
(1 row)

Those numbers are higher than 1% (#define STD_FUZZ_FACTOR 1.01) but
slight variation in all the GUCs that affect cost, might bring the
difference closer to STD_FUZZ_FACTOR.

Given how close they are, maybe it's not such a good idea to
backpatch. If the plans change for better, users won't complain but
otherwise they will complain and will have no way to go back to their
good plans in production.

--
Best Wishes,
Ashutosh Bapat

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#8)

Re: Incorrect cost for MergeAppend

On Wed, 31 Jan 2024 at 18:58, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

with patch
Merge Append (cost=6.94..18.90 rows=198 width=4)

without patch
Sort (cost=19.04..19.54 rows=198 width=4)

Those numbers are higher than 1% (#define STD_FUZZ_FACTOR 1.01) but
slight variation in all the GUCs that affect cost, might bring the
difference closer to STD_FUZZ_FACTOR.

Given how close they are, maybe it's not such a good idea to
backpatch.

The reason those numbers are close is because I reduced the row count
on the test tables to a point where we only just get the Merge Append
plan with a small margin. I don't see the test case costs as a
relevant factor for if we backpatch or not.

What is relevant are things like:

For:
* It's a clear bug and what's happening now is clearly wrong.
* inheritance/partitioned table plan changes for the better in minor versions

Against:
* Nobody has complained for 13 years, so maybe it's unlikely anyone is
suffering too much.
* Possibility of inheritance/partitioned table plans changing for the
worse in minor versions

For now, I'm on the fence on this one.

David

#10

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: David Rowley (#9)

Re: Incorrect cost for MergeAppend

On Wed, Jan 31, 2024 at 12:12 PM David Rowley <dgrowleyml@gmail.com> wrote:

What is relevant are things like:

For:
* It's a clear bug and what's happening now is clearly wrong.
* inheritance/partitioned table plan changes for the better in minor versions

Against:
* Nobody has complained for 13 years, so maybe it's unlikely anyone is
suffering too much.
* Possibility of inheritance/partitioned table plans changing for the
worse in minor versions

That's what I am thinking as well. And the plans that may change for
the worse are the ones where the costs with and without the patch are
close.

Just to be clear, the change is for good and should be committed to
the master. It's the backpatching I am worried about.

--
Best Wishes,
Ashutosh Bapat

#11

Alexander Kuzmenkov

akuzmenkov@timescale.com

about 2 years ago

In reply to: Ashutosh Bapat (#10)

Re: Incorrect cost for MergeAppend

I'd be happy to see this backpatched. What kind of regressions are we
worried about? I'd say partition-wise sort + merge should be faster
than append + sort for reasonably sized tables. That's basically what
tuplesort does inside. Moreso, this can enable index scans on
partitions, which is an even better plan.

On Wed, Jan 31, 2024 at 7:46 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Show quoted text

On Wed, Jan 31, 2024 at 12:12 PM David Rowley <dgrowleyml@gmail.com> wrote:

What is relevant are things like:

For:
* It's a clear bug and what's happening now is clearly wrong.
* inheritance/partitioned table plan changes for the better in minor versions

Against:
* Nobody has complained for 13 years, so maybe it's unlikely anyone is
suffering too much.
* Possibility of inheritance/partitioned table plans changing for the
worse in minor versions

That's what I am thinking as well. And the plans that may change for
the worse are the ones where the costs with and without the patch are
close.

Just to be clear, the change is for good and should be committed to
the master. It's the backpatching I am worried about.

--
Best Wishes,
Ashutosh Bapat

#12

Alexander Kuzmenkov

akuzmenkov@timescale.com

about 2 years ago

In reply to: Alexander Kuzmenkov (#11)

Re: Incorrect cost for MergeAppend

On Wed, Jan 31, 2024 at 1:33 PM Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:

I'd be happy to see this backpatched. What kind of regressions are we
worried about? I'd say partition-wise sort + merge should be faster
than append + sort for reasonably sized tables. That's basically what
tuplesort does inside. Moreso, this can enable index scans on
partitions, which is an even better plan.

To put it another way, this change enables our usual cost model for
MergeAppend to work correctly in the presence of filters. I think we
can consider this model to be reasonably correct, and we don't
currently have major problems with MergeAppend being chosen instead of
Sort + Append in cases where it's suboptimal, right? So applying it
properly in case with filters is not likely to introduce problems.

#13

Alvaro Herrera

alvherre@2ndquadrant.com

about 2 years ago

In reply to: Alexander Kuzmenkov (#12)

Re: Incorrect cost for MergeAppend

On 2024-Jan-31, Alexander Kuzmenkov wrote:

To put it another way, this change enables our usual cost model for
MergeAppend to work correctly in the presence of filters. I think we
can consider this model to be reasonably correct, and we don't
currently have major problems with MergeAppend being chosen instead of
Sort + Append in cases where it's suboptimal, right? So applying it
properly in case with filters is not likely to introduce problems.

Since we have a minor coming up very soon, I think it's not a good idea
to backpatch right now. Maybe you can push to master now, and consider
whether to backpatch later.

The problem is -- if somebody has an application that gets good plans
with the current cost model, and you change the cost model and the plans
become worse, what do they do? If you change this in a major release,
this is not an issue because they must test their queries before
upgrading and if they fail to realize a problem exists then it's their
fault. If you change it in a minor release, then those people will be
very upset that things were changed suddenly, and they may get wary of
future minor upgrades, which we don't want.

Plus, they would need to do careful testing before doing the minor
upgrade.

Maybe plans can only go from bad to good and never from good to bad.
But are we 100% certain that that is the case?

People who are **desperate** to get this improvement can perhaps run a
patched version in the meantime.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#14

Daniel Westermann (DWE)

daniel.westermann@dbi-services.com

about 2 years ago

In reply to: Alvaro Herrera (#13)

Re: Incorrect cost for MergeAppend

Hi,

Since we have a minor coming up very soon, I think it's not a good idea
to backpatch right now. Maybe you can push to master now, and consider
whether to backpatch later.

The problem is -- if somebody has an application that gets good plans
with the current cost model, and you change the cost model and the plans
become worse, what do they do? If you change this in a major release,
this is not an issue because they must test their queries before
upgrading and if they fail to realize a problem exists then it's their
fault. If you change it in a minor release, then those people will be
very upset that things were changed suddenly, and they may get wary of
future minor upgrades, which we don't want.

I agree with this, especially as we tell our customers that such changes do not happen from one minor release to another.

Regards
Daniel

#15

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Alvaro Herrera (#13)

Re: Incorrect cost for MergeAppend

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Since we have a minor coming up very soon, I think it's not a good idea
to backpatch right now. Maybe you can push to master now, and consider
whether to backpatch later.

As a rule, we don't back-patch changes like this ever. People don't
appreciate plans changing in minor releases.

regards, tom lane

#16

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Tom Lane (#15)

Re: Incorrect cost for MergeAppend

On Thu, 1 Feb 2024 at 04:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Since we have a minor coming up very soon, I think it's not a good idea
to backpatch right now. Maybe you can push to master now, and consider
whether to backpatch later.

As a rule, we don't back-patch changes like this ever. People don't
appreciate plans changing in minor releases.

Pushed to master.

Thanks for the report and the fix, Alexander.

#17

Alexander Kuzmenkov

akuzmenkov@timescale.com

about 2 years ago

In reply to: David Rowley (#16)

Re: Incorrect cost for MergeAppend

On Wed, Jan 31, 2024 at 9:49 PM David Rowley <dgrowleyml@gmail.com> wrote:

Pushed to master.

Thanks for the report and the fix, Alexander.

Thank you!

Incorrect cost for MergeAppend

Attachments:

Attachments: