PATCH: generate fractional cheapest paths in generate_orderedappend_path

Started by Tomas Vondraover 4 years ago19 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com
2 attachment(s)

Hi,

This patch is a WIP fix for the issue described in [1]/messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com, where the
planner picks a more expensive plan with partition-wise joins enabled,
and disabling this option produces a cheaper plan. That's a bit strange
because with the option disabled we consider *fewer* plans, so we should
not be able to generate a cheaper plan.

The problem lies in generate_orderedappend_paths(), which builds two
types of append paths - with minimal startup cost, and with minimal
total cost. That however does not work for queries with LIMIT, which
also need to consider at fractional cost, but the path interesting from
this perspective may be eliminated by other paths.

Consider three paths (this comes from the reproducer shared in [1]/messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com):

A: nestloop_path startup 0.585000 total 35708.292500
B: nestloop_path startup 0.292500 total 150004297.292500
C: mergejoin_path startup 9748.112737 total 14102.092737

With some reasonable LIMIT value (e.g. 5% of the data), we really want
to pick path A. But that path is dominated both in startup cost (by B)
and total cost (path C). Hence generate_orderedappend_paths() will
ignore it, and we'll generate a more expensive LIMIT plan.

In [2]/messages/by-id/4006636.1618577893@sss.pgh.pa.us Tom proposed to modify generate_orderedappend_paths() to also
consider the fractional cheapest_path, just like we do for startup and
total costs. The attached patch does exactly that, and it does seem to
do the trick.

There are some loose ends, though:

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not
clear whether to default to cheapest_startup or cheapest_total. We might
also consider an incremental sort, in which case the startup cost
matters (for Sort it does not). So we may need an enhanced version of
get_cheapest_fractional_path_for_pathkeys that generates such paths.

2) Same for the cheapest_total - maybe there's a partially sorted path,
and using it with incremental sort on top would be better than using
cheapest_total_path + sort.

3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry
about require_parallel_safe, just like the other functions nearby.

I'd argue that this patch does not need to add the Incremental Sort
capabilities in (1) and (2) - it's just another place where we decided
not to consider this sort variant yet.

I'm not sure how much this depends on partition-wise join, and why
disabling it generates the right plan. The reproducer uses that, but
AFAICS generate_orderedappend_paths() works like this since 2010 (commit
11cad29c915). I'd bet the issue exists since then and it's possible to
get similar cases even without partition-wise join.

I can reproduce it on PostgreSQL 12, though (which however supports
partition-wise join).

Not sure whether this should be backpatched. We didn't get any reports
until now, so it doesn't seem like a pressing issue. OTOH most users
won't actually notice this, they'll just get worse plans without
realizing there's a better option.

regards

[1]: /messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com
/messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com

[2]: /messages/by-id/4006636.1618577893@sss.pgh.pa.us

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

merge-fraction-cheapest.patchtext/x-patch; charset=UTF-8; name=merge-fraction-cheapest.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index edba5e49a8..0284162034 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1716,6 +1716,7 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		List	   *pathkeys = (List *) lfirst(lcp);
 		List	   *startup_subpaths = NIL;
 		List	   *total_subpaths = NIL;
+		List	   *fractional_subpaths = NIL;
 		bool		startup_neq_total = false;
 		ListCell   *lcr;
 		bool		match_partition_order;
@@ -1745,7 +1746,8 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		{
 			RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
 			Path	   *cheapest_startup,
-					   *cheapest_total;
+					   *cheapest_total,
+					   *cheapest_fractional = NULL;
 
 			/* Locate the right paths, if they are available. */
 			cheapest_startup =
@@ -1761,6 +1763,21 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 											   TOTAL_COST,
 											   false);
 
+			/*
+			 * XXX strange that get_cheapest_fractional_path_for_pathkeys
+			 * does not have require_parallel_safe.
+			 */
+			if (root->tuple_fraction > 0)
+			{
+				double	path_fraction = 1.0 / root->tuple_fraction;
+
+				cheapest_fractional =
+					get_cheapest_fractional_path_for_pathkeys(childrel->pathlist,
+															  pathkeys,
+															  NULL,
+															  path_fraction);
+			}
+
 			/*
 			 * If we can't find any paths with the right order just use the
 			 * cheapest-total path; we'll have to sort it later.
@@ -1773,6 +1790,18 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				Assert(cheapest_total->param_info == NULL);
 			}
 
+			/*
+			 * XXX Do we need to do something about cheapest_fractional?
+			 * It could be NULL if there are no properly sorted paths,
+			 * but then maybe just doing the sort is good enough.
+			 *
+			 * XXX Well, maybe we should not just grab cheapest_total_path
+			 * here, because we might use incremental sort on a path that
+			 * is not fully sorted.
+			 */
+			if ((root->tuple_fraction > 0) && !cheapest_fractional)
+				cheapest_fractional = cheapest_total;
+
 			/*
 			 * Notice whether we actually have different paths for the
 			 * "cheapest" and "total" cases; frequently there will be no point
@@ -1799,6 +1828,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 
 				startup_subpaths = lappend(startup_subpaths, cheapest_startup);
 				total_subpaths = lappend(total_subpaths, cheapest_total);
+
+				if (cheapest_fractional)
+				{
+					cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+					fractional_subpaths = lappend(fractional_subpaths, cheapest_fractional);
+				}
 			}
 			else if (match_partition_order_desc)
 			{
@@ -1812,6 +1847,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 
 				startup_subpaths = lcons(cheapest_startup, startup_subpaths);
 				total_subpaths = lcons(cheapest_total, total_subpaths);
+
+				if (cheapest_fractional)
+				{
+					cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+					fractional_subpaths = lcons(cheapest_fractional, fractional_subpaths);
+				}
 			}
 			else
 			{
@@ -1823,6 +1864,10 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										  &startup_subpaths, NULL);
 				accumulate_append_subpath(cheapest_total,
 										  &total_subpaths, NULL);
+
+				if (cheapest_fractional)
+					accumulate_append_subpath(cheapest_fractional,
+											  &fractional_subpaths, NULL);
 			}
 		}
 
@@ -1849,6 +1894,18 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 														  0,
 														  false,
 														  -1));
+
+			/* XXX maybe we should have startup_new_fractional? */
+			if (fractional_subpaths)
+				add_path(rel, (Path *) create_append_path(root,
+														  rel,
+														  fractional_subpaths,
+														  NIL,
+														  pathkeys,
+														  NULL,
+														  0,
+														  false,
+														  -1));
 		}
 		else
 		{
@@ -1864,6 +1921,14 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 																total_subpaths,
 																pathkeys,
 																NULL));
+
+			/* XXX maybe we should have startup_new_fractional? */
+			if (fractional_subpaths)
+				add_path(rel, (Path *) create_merge_append_path(root,
+																rel,
+																fractional_subpaths,
+																pathkeys,
+																NULL));
 		}
 	}
 }
optimizer_first_rows_simple.sqlapplication/sql; name=optimizer_first_rows_simple.sqlDownload
#2Arne Roland
A.Roland@index.de
In reply to: Tomas Vondra (#1)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

thanks for looking into it!

For some reason the patch doesn't apply at my end, could you repost one based at the master?

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not
clear whether to default to cheapest_startup or cheapest_total. We might
also consider an incremental sort, in which case the startup cost
matters (for Sort it does not). So we may need an enhanced version of
get_cheapest_fractional_path_for_pathkeys that generates such paths.

2) Same for the cheapest_total - maybe there's a partially sorted path,
and using it with incremental sort on top would be better than using
cheapest_total_path + sort.

I'd argue that this patch does not need to add the Incremental Sort
capabilities in (1) and (2) - it's just another place where we decided
not to consider this sort variant yet.

I'd say your reasoning is sound. If I'd want to get better partial costs for incremental sorts, I'd look at get_cheapest_fractional_path first. That sounds more important than generate_orderedappend_paths. Either way I'd say that is a completely separate issue and I think that should be looked at separately.

3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry

about require_parallel_safe, just like the other functions nearby.

I think it should. We have a ParallelAppend node after all.
I'm not really familiar with the way get_cheapest_fractional_path_for_pathkeys is used, but a quick search suggests to me, that build_minmax_path was thus far the only one using it. And minmax paths are never parallel safe anyway. I think that is the reason it doesn't do that already.

Regards

Arne

________________________________
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Saturday, April 17, 2021 1:52:19 AM
To: pgsql-hackers
Subject: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

This patch is a WIP fix for the issue described in [1]/messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com, where the
planner picks a more expensive plan with partition-wise joins enabled,
and disabling this option produces a cheaper plan. That's a bit strange
because with the option disabled we consider *fewer* plans, so we should
not be able to generate a cheaper plan.

The problem lies in generate_orderedappend_paths(), which builds two
types of append paths - with minimal startup cost, and with minimal
total cost. That however does not work for queries with LIMIT, which
also need to consider at fractional cost, but the path interesting from
this perspective may be eliminated by other paths.

Consider three paths (this comes from the reproducer shared in [1]/messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com):

A: nestloop_path startup 0.585000 total 35708.292500
B: nestloop_path startup 0.292500 total 150004297.292500
C: mergejoin_path startup 9748.112737 total 14102.092737

With some reasonable LIMIT value (e.g. 5% of the data), we really want
to pick path A. But that path is dominated both in startup cost (by B)
and total cost (path C). Hence generate_orderedappend_paths() will
ignore it, and we'll generate a more expensive LIMIT plan.

In [2]/messages/by-id/4006636.1618577893@sss.pgh.pa.us Tom proposed to modify generate_orderedappend_paths() to also
consider the fractional cheapest_path, just like we do for startup and
total costs. The attached patch does exactly that, and it does seem to
do the trick.

There are some loose ends, though:

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not
clear whether to default to cheapest_startup or cheapest_total. We might
also consider an incremental sort, in which case the startup cost
matters (for Sort it does not). So we may need an enhanced version of
get_cheapest_fractional_path_for_pathkeys that generates such paths.

2) Same for the cheapest_total - maybe there's a partially sorted path,
and using it with incremental sort on top would be better than using
cheapest_total_path + sort.

3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry
about require_parallel_safe, just like the other functions nearby.

I'd argue that this patch does not need to add the Incremental Sort
capabilities in (1) and (2) - it's just another place where we decided
not to consider this sort variant yet.

I'm not sure how much this depends on partition-wise join, and why
disabling it generates the right plan. The reproducer uses that, but
AFAICS generate_orderedappend_paths() works like this since 2010 (commit
11cad29c915). I'd bet the issue exists since then and it's possible to
get similar cases even without partition-wise join.

I can reproduce it on PostgreSQL 12, though (which however supports
partition-wise join).

Not sure whether this should be backpatched. We didn't get any reports
until now, so it doesn't seem like a pressing issue. OTOH most users
won't actually notice this, they'll just get worse plans without
realizing there's a better option.

regards

[1]: /messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com
/messages/by-id/011937a3-7427-b99f-13f1-c07a127cf94c@enterprisedb.com

[2]: /messages/by-id/4006636.1618577893@sss.pgh.pa.us

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Arne Roland
A.Roland@index.de
In reply to: Arne Roland (#2)
1 attachment(s)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

I haven't tested the parallel case, but I think we should sort out (3) get_cheapest_fractional_path_for_pathkeys as mentioned above.

I am lost about the comment regarding startup_new_fractional. Could you elaborate what you mean by that?

Apart from that, I'd argue for a small test case. I attached a slimmed down case of what we were trying to fix. It might be worth to integrate that with an existing test, since more than a third of the time seems to be consumed by the creation and attachment of partitions.

Regards
Arne

Attachments:

merge-fraction-cheapest_example.sqlapplication/sql; name=merge-fraction-cheapest_example.sqlDownload
#4Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Arne Roland (#3)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

On 6/3/21 7:17 PM, Arne Roland wrote:

Hi,

I haven't tested the parallel case, but I think we should sort out (3)
get_cheapest_fractional_path_for_pathkeys as mentioned above.

Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.

I am lost about the comment regarding startup_new_fractional. Could you
elaborate what you mean by that?

Not sure what this refers to either - there's no startup_new_fractional
in my message and 'git grep startup_new_fractional' returns nothing.

Apart from that, I'd argue for a small test case. I attached a slimmed
down case of what we were trying to fix. It might be worth to integrate
that with an existing test, since more than a third of the time seems to
be consumed by the creation and attachment of partitions.

Maybe, if there's a suitable table to reuse, we can do that. But I don't
think it matters it takes ~1/3 of the time to attach the partitions.
What's more important is whether it measurably slows down the test
suite, and I don't think that's an issue.

In any case, this seems a bit premature - we need something to test the
patch etc. We can worry about how expensive the test is much later.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#4)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

On Thu, Jun 3, 2021 at 11:12 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Hi,

On 6/3/21 7:17 PM, Arne Roland wrote:

Hi,

I haven't tested the parallel case, but I think we should sort out (3)
get_cheapest_fractional_path_for_pathkeys as mentioned above.

Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.

I am lost about the comment regarding startup_new_fractional. Could you
elaborate what you mean by that?

Not sure what this refers to either - there's no startup_new_fractional
in my message and 'git grep startup_new_fractional' returns nothing.

Apart from that, I'd argue for a small test case. I attached a slimmed
down case of what we were trying to fix. It might be worth to integrate
that with an existing test, since more than a third of the time seems to
be consumed by the creation and attachment of partitions.

Maybe, if there's a suitable table to reuse, we can do that. But I don't
think it matters it takes ~1/3 of the time to attach the partitions.
What's more important is whether it measurably slows down the test
suite, and I don't think that's an issue.

In any case, this seems a bit premature - we need something to test the
patch etc. We can worry about how expensive the test is much later.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

In REL_11_STABLE branch, a search revealed the following:

src/backend/optimizer/path/pathkeys.c: *
get_cheapest_fractional_path_for_pathkeys
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List
*paths,
src/backend/optimizer/plan/planagg.c:
get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
src/include/optimizer/paths.h:extern Path
*get_cheapest_fractional_path_for_pathkeys(List *paths,

It seems this function has been refactored out in subsequent releases.

FYI

#6Zhihong Yu
zyu@yugabyte.com
In reply to: Zhihong Yu (#5)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

On Thu, Jun 3, 2021 at 1:50 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Thu, Jun 3, 2021 at 11:12 AM Tomas Vondra <
tomas.vondra@enterprisedb.com> wrote:

Hi,

On 6/3/21 7:17 PM, Arne Roland wrote:

Hi,

I haven't tested the parallel case, but I think we should sort out (3)
get_cheapest_fractional_path_for_pathkeys as mentioned above.

Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.

I am lost about the comment regarding startup_new_fractional. Could you
elaborate what you mean by that?

Not sure what this refers to either - there's no startup_new_fractional
in my message and 'git grep startup_new_fractional' returns nothing.

Apart from that, I'd argue for a small test case. I attached a slimmed
down case of what we were trying to fix. It might be worth to integrate
that with an existing test, since more than a third of the time seems to
be consumed by the creation and attachment of partitions.

Maybe, if there's a suitable table to reuse, we can do that. But I don't
think it matters it takes ~1/3 of the time to attach the partitions.
What's more important is whether it measurably slows down the test
suite, and I don't think that's an issue.

In any case, this seems a bit premature - we need something to test the
patch etc. We can worry about how expensive the test is much later.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

In REL_11_STABLE branch, a search revealed the following:

src/backend/optimizer/path/pathkeys.c: *
get_cheapest_fractional_path_for_pathkeys
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List
*paths,
src/backend/optimizer/plan/planagg.c:
get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
src/include/optimizer/paths.h:extern Path
*get_cheapest_fractional_path_for_pathkeys(List *paths,

It seems this function has been refactored out in subsequent releases.

FYI

Sent a bit too soon.

The above function still exists.
But startup_new_fractional was nowhere to be found.

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Zhihong Yu (#6)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

On 6/3/21 10:52 PM, Zhihong Yu wrote:

...

Sent a bit too soon.

The above function still exists.
But startup_new_fractional was nowhere to be found.

Actually, there are two comments

/* XXX maybe we should have startup_new_fractional? */

in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be

/* XXX maybe we should have startup_neq_fractional? */

and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.

At least I think that was the idea when I wrote the patch, it way too
long ago.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Arne Roland
A.Roland@index.de
In reply to: Tomas Vondra (#7)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

thanks for the quick reply!

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, June 3, 2021 20:11
To: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

I haven't tested the parallel case, but I think we should sort out (3)
get_cheapest_fractional_path_for_pathkeys as mentioned above.

Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.

I was referring to one message above. I thought the thread was still short enough. Apparently to much time has passed. Sorry, I hope this mail is better. I was referring to my post from April:
________________________________
From: Arne Roland
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry

about require_parallel_safe, just like the other functions nearby.

I think it should. We have a ParallelAppend node after all.
I'm not really familiar with the way get_cheapest_fractional_path_for_pathkeys is used, but a quick search suggests to me, that build_minmax_path was thus far the only one using it. And minmax paths are never parallel safe anyway. I think that is the reason it doesn't do that already.

________________________________
From: Zhihong Yu <zyu@yugabyte.com>
Sent: Thursday, June 3, 2021 22:50
To: Tomas Vondra
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,
In REL_11_STABLE branch, a search revealed the following:

src/backend/optimizer/path/pathkeys.c: * get_cheapest_fractional_path_for_pathkeys
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List *paths,
src/backend/optimizer/plan/planagg.c: get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
src/include/optimizer/paths.h:extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,

It seems this function has been refactored out in subsequent releases.

FYI

Thanks for the info!
I doubt there is any interest to back patch this anywhere. My most ambitious dream would be getting this into pg 14.

I think, we only care about a parallel safety aware variant anyways, which afaict never existed.

________________________________
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Actually, there are two comments

/* XXX maybe we should have startup_new_fractional? */

in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be

/* XXX maybe we should have startup_neq_fractional? */

and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.

At least I think that was the idea when I wrote the patch, it way too
long ago.

Sorry, I almost forgot about this myself. I only got reminded upon seeing that again with different queries/tables.
Just to be sure I get this correctly: You mean startup_gt_fractional (cost) as an additional condition, right?

Regards
Arne

#9Arne Roland
A.Roland@index.de
In reply to: Arne Roland (#8)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi Tomas,

I don't think there is much work left to do here.

Did you have a look at the test case? Did it make sense to you?

And I am sorry. I had another look at this and it seems I was confused (again).

From: Arne Roland
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

I think it should. We have a ParallelAppend node after all.
I'm not really familiar with the way get_cheapest_fractional_path_for_pathkeys is used, but a quick search suggests to
me, that build_minmax_path was thus far the only one using it. And minmax paths are never parallel safe anyway. I think that is the reason it doesn't do that already.

The whole segment were are talking about obviously assumes require_parallel_safe is not needed. I wasn't aware that in set_append_rel_size. And I just realized there is a great comment explaining why it rightfully does so:
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
* not; but we don't have that today, so it's a waste to consider
* partial paths anywhere in the appendrel unless it's all safe.
* (Child rels visited before this one will be unmarked in
* set_append_rel_pathlist().)
*/
So afaik we don't need to think further about this.

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Actually, there are two comments

/* XXX maybe we should have startup_new_fractional? */

in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be

/* XXX maybe we should have startup_neq_fractional? */

and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.

At least I think that was the idea when I wrote the patch, it way too
long ago.

Sorry, I almost forgot about this myself. I only got reminded upon seeing that again with different queries/tables.
Just to be sure I get this correctly: You mean startup_gt_fractional (cost) as an additional condition, right?

Could you clarify that for me?

Regards
Arne

#10Arne Roland
A.Roland@index.de
In reply to: Arne Roland (#9)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Afaiac we should add a simple testcase here, like I suggested in 477344d5f17c4a8e95d3a5bb6642718a</messages/by-id/477344d5f17c4a8e95d3a5bb6642718a@index.de&gt;. Apart from that I am not sure there is work to be done here.

Am I wrong?

Regards

Arne

________________________________
From: Arne Roland <A.Roland@index.de>
Sent: Saturday, June 26, 2021 5:50:49 PM
To: Tomas Vondra
Cc: pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi Tomas,

I don't think there is much work left to do here.

Did you have a look at the test case? Did it make sense to you?

And I am sorry. I had another look at this and it seems I was confused (again).

From: Arne Roland
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

I think it should. We have a ParallelAppend node after all.
I'm not really familiar with the way get_cheapest_fractional_path_for_pathkeys is used, but a quick search suggests to
me, that build_minmax_path was thus far the only one using it. And minmax paths are never parallel safe anyway. I think that is the reason it doesn't do that already.

The whole segment were are talking about obviously assumes require_parallel_safe is not needed. I wasn't aware that in set_append_rel_size. And I just realized there is a great comment explaining why it rightfully does so:
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
* not; but we don't have that today, so it's a waste to consider
* partial paths anywhere in the appendrel unless it's all safe.
* (Child rels visited before this one will be unmarked in
* set_append_rel_pathlist().)
*/
So afaik we don't need to think further about this.

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Actually, there are two comments

/* XXX maybe we should have startup_new_fractional? */

in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be

/* XXX maybe we should have startup_neq_fractional? */

and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.

At least I think that was the idea when I wrote the patch, it way too
long ago.

Sorry, I almost forgot about this myself. I only got reminded upon seeing that again with different queries/tables.
Just to be sure I get this correctly: You mean startup_gt_fractional (cost) as an additional condition, right?

Could you clarify that for me?

Regards
Arne

#11Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Arne Roland (#10)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

On 12/2/21 15:58, Arne Roland wrote:

Afaiac we should add a simple testcase here, like I suggested in
477344d5f17c4a8e95d3a5bb6642718a
</messages/by-id/477344d5f17c4a8e95d3a5bb6642718a@index.de&gt;.
Apart from that I am not sure there is work to be done here.

Well, I mentioned three open questions in my first message, and I don't
think we've really addressed them yet. We've agreed we don't need to add
the incremental sort here, but that leaves

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we
default to cheapest_startup or cheapest_total?

2) Should get_cheapest_fractional_path_for_pathkeys worry about
require_parallel_safe? I think yes, but we need to update the patch.

I'll take a closer look next week, once I get home from NYC, and I'll
submit an improved version for the January CF.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Arne Roland
A.Roland@index.de
In reply to: Tomas Vondra (#11)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

thanks for the reply!

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, December 2, 2021 20:58
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

[...]
Well, I mentioned three open questions in my first message, and I don't
think we've really addressed them yet. We've agreed we don't need to add
the incremental sort here, but that leaves

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we
default to cheapest_startup or cheapest_total?

I think it's reasonable to use cheapest_total like we are doing now. I hardly see any reason to change it.
The incremental sort case you mentioned, seems like the only case that plan might be interesting. If we really want that to happen, we probably should check for that separately, i.e. having startup_fractional. Even though this is a fairly special case as it's mostly relevant for partitionwise joins, I'm still not convinced it's worth the cpu cycles. The point is that in most cases factional and startup_fractional will be the same anyways.
And I suspect, even if they aren't, solving that from an application developers point of view, is in most cases not that difficult. One can usually match the pathkey. I personally had a lot of real world issues with missing fractional paths using primary keys. I think it's worth noting that everything will likely match the partition keys anyways, because otherwise there is no chance of doing a merge append.
If I am not mistaken, in case we end up doing a full sort, the cheapest_total path should be completely sufficient.

2) Should get_cheapest_fractional_path_for_pathkeys worry about
require_parallel_safe? I think yes, but we need to update the patch.

I admit, that such a version of get_cheapest_fractional_path_for_pathkeys could be consistent with other functions. And I was confused about that before. But I am not sure what to use require_parallel_safe for. build_minmax_path doesn't care, they are never parallel safe. And afaict generate_orderedappend_paths cares neither, it considers all plans. For instance when it calls get_cheapest_path_for_pathkeys, it sets require_parallel_safe just to false as well.

I'll take a closer look next week, once I get home from NYC, and I'll
submit an improved version for the January CF.

Thank you for your work! The current patch, apart from the comments/regression tests, seems pretty reasonable to me.

Regards
Arne

#13Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Arne Roland (#12)
1 attachment(s)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

On 12/10/21 00:51, Arne Roland wrote:

Hi,

thanks for the reply!

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, December 2, 2021 20:58
Subject: Re: PATCH: generate fractional cheapest paths in
generate_orderedappend_path

[...]
Well, I mentioned three open questions in my first message, and I don't
think we've really addressed them yet. We've agreed we don't need to add
the incremental sort here, but that leaves

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we
default to cheapest_startup or cheapest_total?

I think it's reasonable to use cheapest_total like we are doing now. I
hardly see any reason to change it.

True, it's a reasonable first step.

Either we generate the same plan as today (with cheapest_total), or
maybe a better one (if we find a cheaper fractional path with matching
pathkeys). It's true we could do better, but that's life - it's not like
we consider every possible path everywhere.

The incremental sort case you mentioned, seems like the only case that
plan might be interesting. If we really want that to happen, we probably
should check for that separately, i.e. having startup_fractional. Even
though this is a fairly special case as it's mostly relevant for
partitionwise joins, I'm still not convinced it's worth the cpu cycles.
The point is that in most cases factional and startup_fractional will be
the same anyways.

I don't think we can simply check for startup_fractional (although, I'm
not sure I fully understand what would that be). I think what we should
really do in this case is walking all the paths, ensuring it's properly
sorted (with either full or incremental sort), and then pick the
cheapest fractional path from these sorted paths. But I agree that seems
pretty expensive.

And I suspect, even if they aren't, solving that from an application
developers point of view, is in most cases not that difficult. One can
usually match the pathkey. I personally had a lot of real world issues
with missing fractional paths using primary keys. I think it's worth
noting that everything will likely match the partition keys anyways,
because otherwise there is no chance of doing a merge append.

Yeah, I think you're right.

If I am not mistaken, in case we end up doing a full sort, the
cheapest_total path should be completely sufficient.

Definitely true.

2) Should get_cheapest_fractional_path_for_pathkeys worry about
require_parallel_safe? I think yes, but we need to update the patch.

I admit, that such a version of
get_cheapest_fractional_path_for_pathkeys could be consistent with other
functions. And I was confused about that before. But I am not sure what
to use require_parallel_safe for. build_minmax_path doesn't care, they
are never parallel safe. And afaict generate_orderedappend_paths cares
neither, it considers all plans. For instance when it calls
get_cheapest_path_for_pathkeys, it sets require_parallel_safe just to
false as well.

True as well. It's looks a bit strange, but you're right neither place
cares about parallel safety.

I'll take a closer look next week, once I get home from NYC, and I'll
submit an improved version for the January CF.

Thank you for your work! The current patch, apart from the
comments/regression tests, seems pretty reasonable to me.

Attached is a cleaned-up patch, with a simple regression test. I'll mark
this as RFC and get it committed in a couple days.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

merge-fraction-cheapest-20211211.patchtext/x-patch; charset=UTF-8; name=merge-fraction-cheapest-20211211.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 296dd75c1b..4a5f52287e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1716,6 +1716,7 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		List	   *pathkeys = (List *) lfirst(lcp);
 		List	   *startup_subpaths = NIL;
 		List	   *total_subpaths = NIL;
+		List	   *fractional_subpaths = NIL;
 		bool		startup_neq_total = false;
 		ListCell   *lcr;
 		bool		match_partition_order;
@@ -1745,7 +1746,8 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		{
 			RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
 			Path	   *cheapest_startup,
-					   *cheapest_total;
+					   *cheapest_total,
+					   *cheapest_fractional = NULL;
 
 			/* Locate the right paths, if they are available. */
 			cheapest_startup =
@@ -1773,6 +1775,33 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				Assert(cheapest_total->param_info == NULL);
 			}
 
+			/*
+			 * When needed (building fractional path), determine the cheapest
+			 * fractional path too.
+			 */
+			if (root->tuple_fraction > 0)
+			{
+				double	path_fraction = (1.0 / root->tuple_fraction);
+
+				cheapest_fractional =
+					get_cheapest_fractional_path_for_pathkeys(childrel->pathlist,
+															  pathkeys,
+															  NULL,
+															  path_fraction);
+
+				/*
+				 * If we found no path with matching pathkeys, use the cheapest
+				 * total path instead.
+				 *
+				 * XXX We might be smarter and consider partially sorted paths,
+				 * with just an incremental sort on top. That might be cheaper
+				 * than using cheapest_total with regular sort, but we'd have
+				 * to build all the incremental paths.
+				 */
+				if (!cheapest_fractional)
+					cheapest_fractional = cheapest_total;
+			}
+
 			/*
 			 * Notice whether we actually have different paths for the
 			 * "cheapest" and "total" cases; frequently there will be no point
@@ -1799,6 +1828,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 
 				startup_subpaths = lappend(startup_subpaths, cheapest_startup);
 				total_subpaths = lappend(total_subpaths, cheapest_total);
+
+				if (cheapest_fractional)
+				{
+					cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+					fractional_subpaths = lappend(fractional_subpaths, cheapest_fractional);
+				}
 			}
 			else if (match_partition_order_desc)
 			{
@@ -1812,6 +1847,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 
 				startup_subpaths = lcons(cheapest_startup, startup_subpaths);
 				total_subpaths = lcons(cheapest_total, total_subpaths);
+
+				if (cheapest_fractional)
+				{
+					cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+					fractional_subpaths = lcons(cheapest_fractional, fractional_subpaths);
+				}
 			}
 			else
 			{
@@ -1823,6 +1864,10 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										  &startup_subpaths, NULL);
 				accumulate_append_subpath(cheapest_total,
 										  &total_subpaths, NULL);
+
+				if (cheapest_fractional)
+					accumulate_append_subpath(cheapest_fractional,
+											  &fractional_subpaths, NULL);
 			}
 		}
 
@@ -1849,6 +1894,17 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 														  0,
 														  false,
 														  -1));
+
+			if (fractional_subpaths)
+				add_path(rel, (Path *) create_append_path(root,
+														  rel,
+														  fractional_subpaths,
+														  NIL,
+														  pathkeys,
+														  NULL,
+														  0,
+														  false,
+														  -1));
 		}
 		else
 		{
@@ -1864,6 +1920,13 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 																total_subpaths,
 																pathkeys,
 																NULL));
+
+			if (fractional_subpaths)
+				add_path(rel, (Path *) create_merge_append_path(root,
+																rel,
+																fractional_subpaths,
+																pathkeys,
+																NULL));
 		}
 	}
 }
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27f7525b3e..bb5b7c47a4 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -4862,3 +4862,51 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2
   1 | 209 | 0009 |  1 | 209 | 0009
 (8 rows)
 
+-- partitionwise join with fractional paths
+CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+-- insert data
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+ANALYZE fract_t;
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+SET enable_partitionwise_join = on;
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: x.id
+         ->  Merge Left Join
+               Merge Cond: (x_1.id = y_1.id)
+               ->  Index Only Scan using fract_t0_pkey on fract_t0 x_1
+               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
+         ->  Merge Left Join
+               Merge Cond: (x_2.id = y_2.id)
+               ->  Index Only Scan using fract_t1_pkey on fract_t1 x_2
+               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
+(11 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10;
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: x.id DESC
+         ->  Nested Loop Left Join
+               ->  Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1
+               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
+                     Index Cond: (id = x_1.id)
+         ->  Nested Loop Left Join
+               ->  Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2
+               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
+                     Index Cond: (id = x_2.id)
+(11 rows)
+
+-- cleanup
+DROP TABLE fract_t;
+RESET max_parallel_workers_per_gather;
+RESET enable_partitionwise_join;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index d97b5b69ff..67f506361f 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1142,3 +1142,28 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.c = t2
 EXPLAIN (COSTS OFF)
 SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
 SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
+
+-- partitionwise join with fractional paths
+CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+
+-- insert data
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+ANALYZE fract_t;
+
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+SET enable_partitionwise_join = on;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10;
+
+-- cleanup
+DROP TABLE fract_t;
+
+RESET max_parallel_workers_per_gather;
+RESET enable_partitionwise_join;
#14Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#13)
1 attachment(s)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Pushed, after clarifying the comments a bit.

I also looked into what would it take to consider incremental paths, and
in principle it doesn't seem all that complicated. The attached PoC
patch extends get_cheapest_fractional_path_for_pathkeys() to optionally
build incremental sort on paths if needed. There are two GUCs that make
experimenting simpler:

* enable_fractional_paths - disables fractional paths entirely, i.e. we
get behavior without the part I already pushed

* enable_fractional_incremental_paths - disables the incremental sort
part added by the attached patch

With this, I get this plan (see the test in partitionwise_join.sql)

test=# EXPLAIN (COSTS OFF)
test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
ORDER BY id1 ASC, id2 ASC LIMIT 10;
QUERY PLAN

------------------------------------------------------------------------------
Limit
-> Merge Left Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Append
-> Index Only Scan using fract_t0_id1_id2_idx on
fract_t0 x_1
-> Incremental Sort
Sort Key: x_2.id1, x_2.id2
Presorted Key: x_2.id1
-> Index Scan using fract_t1_pkey on fract_t1 x_2
-> Materialize
-> Append
-> Incremental Sort
Sort Key: y_1.id1, y_1.id2
Presorted Key: y_1.id1
-> Index Scan using fract_t0_pkey on
fract_t0 y_1
Index Cond: (id1 = id1)
Filter: (id2 = id2)
-> Incremental Sort
Sort Key: y_2.id1, y_2.id2
Presorted Key: y_2.id1
-> Index Scan using fract_t1_pkey on
fract_t1 y_2
Index Cond: (id1 = id1)
Filter: (id2 = id2)
(23 rows)

instead of

QUERY PLAN

------------------------------------------------------------------------------
Limit
-> Incremental Sort
Sort Key: x.id1, x.id2
Presorted Key: x.id1
-> Merge Left Join
Merge Cond: (x.id1 = y.id1)
Join Filter: (x.id2 = y.id2)
-> Append
-> Index Scan using fract_t0_pkey on fract_t0 x_1
-> Index Scan using fract_t1_pkey on fract_t1 x_2
-> Materialize
-> Append
-> Index Scan using fract_t0_pkey on
fract_t0 y_1
-> Index Scan using fract_t1_pkey on
fract_t1 y_2
(14 rows)

i.e. the incremental sorts moved below the merge join (and the cost is
lower, but that's not shown here).

So that seems reasonable, but there's a couple issues too:

1) Planning works (hence we can run explain), but execution fails
because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
some reason. I haven't looked into the details, but I'd guess we need to
pass a different rel to create_incrementalsort_path, not childrel.

2) enable_partitionwisejoin=on seems to cause some confusion, because it
results in picking a different plan with higher cost. But that's clearly
not because of this patch.

3) I'm still a bit skeptical about the cost of this implementation - it
builds the incrementalsort path, just to throw most of the paths away.
It'd be much better to just calculate the cost, which should be much
cheaper, and only do the heavy work for the one "best" path.

4) One thing I did not realize before is what pathkeys we even consider
here. Imagine you have two tables:

CREATE TABLE t1 (a int, b int, primary key (a));
CREATE TABLE t2 (a int, b int, primary key (a));

and query

SELECT * FROM t1 JOIN t2 USING (a,b);

It seems reasonable to also consider pathkeys (a,b) because that's make
e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
consider pathkeys that at least one child relation has, so in this case
it's just (a). Which means we'll never consider incremental sort for
this particular example.

It's a bit strange, because it's enough to create index on (a,b) for
just one of the relations, and it'll suddenly consider building
incremental sort on both sides.

I don't plan to pursue this further at this point, so I pushed the first
part because that's a fairly simple improvement over what we have now.
But it seems like a fairly promising area for improvements.

Also, the non-intuitive effects of enabling partition-wise joins (i.e.
picking plans with higher estimated costs) is something worth exploring,
I guess. But that's a separate issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Consider-incremental-sort-in-generate_orderedappend_.patchtext/x-patch; charset=UTF-8; name=0001-Consider-incremental-sort-in-generate_orderedappend_.patchDownload
From 60c69a72ec421ac00deadf1f9c70f7acee689b67 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 12 Jan 2022 22:22:31 +0100
Subject: [PATCH] Consider incremental sort in generate_orderedappend_paths

---
 src/backend/optimizer/path/allpaths.c   | 11 ++++--
 src/backend/optimizer/path/pathkeys.c   | 47 ++++++++++++++++++++++--
 src/backend/optimizer/plan/planagg.c    |  6 ++--
 src/backend/utils/misc/guc.c            | 20 +++++++++++
 src/include/optimizer/paths.h           |  9 +++--
 src/test/regress/sql/partition_join.sql | 48 +++++++++++++++++++++++++
 6 files changed, 132 insertions(+), 9 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 169b1d53fc8..1cd3fc6f585 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -60,6 +60,8 @@ typedef struct pushdown_safety_info
 
 /* These parameters are set by GUC */
 bool		enable_geqo = false;	/* just in case GUC doesn't set it */
+bool		enable_fractional_paths = true;	/* just in case GUC doesn't set it */
+bool		enable_fractional_incremental_paths = true;	/* just in case GUC doesn't set it */
 int			geqo_threshold;
 int			min_parallel_table_scan_size;
 int			min_parallel_index_scan_size;
@@ -1784,15 +1786,18 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			 * When needed (building fractional path), determine the cheapest
 			 * fractional path too.
 			 */
-			if (root->tuple_fraction > 0)
+			if (enable_fractional_paths && (root->tuple_fraction > 0))
 			{
 				double	path_fraction = (1.0 / root->tuple_fraction);
 
 				cheapest_fractional =
-					get_cheapest_fractional_path_for_pathkeys(childrel->pathlist,
+					get_cheapest_fractional_path_for_pathkeys(root,
+															  childrel,
+															  childrel->pathlist,
 															  pathkeys,
 															  NULL,
-															  path_fraction);
+															  path_fraction,
+															  enable_fractional_incremental_paths);
 
 				/*
 				 * If we found no path with matching pathkeys, use the cheapest
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 86a35cdef17..0c935199fff 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -22,6 +22,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
+#include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -446,17 +447,24 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
  * 'fraction' is the fraction of the total tuples expected to be retrieved
  */
 Path *
-get_cheapest_fractional_path_for_pathkeys(List *paths,
+get_cheapest_fractional_path_for_pathkeys(PlannerInfo *root, RelOptInfo *rel,
+										  List *paths,
 										  List *pathkeys,
 										  Relids required_outer,
-										  double fraction)
+										  double fraction,
+										  bool incremental_sort)
 {
 	Path	   *matched_path = NULL;
 	ListCell   *l;
+	bool		try_incremental_sort;
+
+	try_incremental_sort = (enable_incremental_sort && incremental_sort);
 
 	foreach(l, paths)
 	{
 		Path	   *path = (Path *) lfirst(l);
+		bool		is_sorted;
+		int			presorted_keys;
 
 		/*
 		 * Since cost comparison is a lot cheaper than pathkey comparison, do
@@ -468,7 +476,42 @@ get_cheapest_fractional_path_for_pathkeys(List *paths,
 
 		if (pathkeys_contained_in(pathkeys, path->pathkeys) &&
 			bms_is_subset(PATH_REQ_OUTER(path), required_outer))
+		{
 			matched_path = path;
+			continue;
+		}
+
+		/* Incremental sort disabled or not requested. */
+		if (!try_incremental_sort)
+			continue;
+
+		is_sorted = pathkeys_count_contained_in(pathkeys,
+												path->pathkeys,
+												&presorted_keys);
+
+		/*
+		 * If fully sorted, it should have been handled before. If there
+		 * are no presorted keys, no point in trying incremental sort.
+		 */
+		if (is_sorted || (presorted_keys == 0))
+			continue;
+
+		path = (Path *) create_incremental_sort_path(root,
+													 rel,
+													 path,
+													 pathkeys,
+													 presorted_keys,
+													 -1); /* FIXME can we get something better? */
+
+		/*
+		 * Since cost comparison is a lot cheaper than pathkey comparison, do
+		 * that first.  (XXX is that still true?)
+		 */
+		if (matched_path != NULL &&
+			compare_fractional_path_costs(matched_path, path, fraction) <= 0)
+			continue;
+
+		matched_path = path;
 	}
 	return matched_path;
 }
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 9330908cbf1..fc68305b8e4 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -439,10 +439,12 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 		path_fraction = 1.0;
 
 	sorted_path =
-		get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+		get_cheapest_fractional_path_for_pathkeys(root, final_rel,
+												  final_rel->pathlist,
 												  subroot->query_pathkeys,
 												  NULL,
-												  path_fraction);
+												  path_fraction,
+												  false);
 	if (!sorted_path)
 		return false;
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fc5cbc09a5..bb8a9e3242a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1173,6 +1173,26 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_fractional_paths", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables fractional paths in generate_orderappend_paths."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_fractional_paths,
+		true,
+		NULL, NULL, NULL
+	},
+	{
+		{"enable_fractional_incremental_paths", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables fractional paths with incremental sort in generate_orderappend_paths."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_fractional_incremental_paths,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0c3a0b90c85..1ad62dd9794 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,8 @@
  * allpaths.c
  */
 extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_fractional_paths;
+extern PGDLLIMPORT bool enable_fractional_incremental_paths;
 extern PGDLLIMPORT int geqo_threshold;
 extern PGDLLIMPORT int min_parallel_table_scan_size;
 extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -207,10 +209,13 @@ extern Path *get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
 											Relids required_outer,
 											CostSelector cost_criterion,
 											bool require_parallel_safe);
-extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
+extern Path *get_cheapest_fractional_path_for_pathkeys(PlannerInfo *root,
+													   RelOptInfo *rel,
+													   List *paths,
 													   List *pathkeys,
 													   Relids required_outer,
-													   double fraction);
+													   double fraction,
+													   bool incremental_sort);
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 67f506361f8..9263e2464db 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1165,5 +1165,53 @@ SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10
 -- cleanup
 DROP TABLE fract_t;
 
+-- partitionwise join with fractional paths and incremental sort
+CREATE TABLE fract_t (id1 BIGINT, id2 BIGINT, PRIMARY KEY (id1)) PARTITION BY RANGE (id1);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+
+
+CREATE INDEX ON fract_t0 (id1, id2);
+
+-- insert data
+INSERT INTO fract_t (id1, id2) (SELECT i, i FROM generate_series(0, 1999) s(i));
+ANALYZE fract_t;
+
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+
+-- important, otherwise a plan with higher cost wins
+SET enable_partitionwise_join = off;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 ASC, id2 ASC LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 DESC, id2 DESC LIMIT 10;
+
+-- no incremental sorts below append
+SET enable_fractional_incremental_paths = off;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 ASC, id2 ASC LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 DESC, id2 DESC LIMIT 10;
+
+-- no fractional paths
+SET enable_fractional_paths = off;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 ASC, id2 ASC LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 DESC, id2 DESC LIMIT 10;
+
+
+-- cleanup
+DROP TABLE fract_t;
+
+
+
 RESET max_parallel_workers_per_gather;
 RESET enable_partitionwise_join;
-- 
2.31.1

#15Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#14)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

FWIW this is now marked as committed. I've created a separate entry in
the next CF for the incremental sort part.

https://commitfest.postgresql.org/37/3513/

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Arne Roland
A.Roland@index.de
In reply to: Tomas Vondra (#14)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi!

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
ORDER BY id1 ASC, id2 ASC LIMIT 10;
QUERY PLAN

------------------------------------------------------------------------------
Limit
-> Merge Left Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Append
-> Index Only Scan using fract_t0_id1_id2_idx on
fract_t0 x_1
-> Incremental Sort
Sort Key: x_2.id1, x_2.id2
Presorted Key: x_2.id1
-> Index Scan using fract_t1_pkey on fract_t1 x_2
-> Materialize
-> Append
-> Incremental Sort
Sort Key: y_1.id1, y_1.id2
Presorted Key: y_1.id1
-> Index Scan using fract_t0_pkey on
fract_t0 y_1
Index Cond: (id1 = id1)
Filter: (id2 = id2)
-> Incremental Sort
Sort Key: y_2.id1, y_2.id2
Presorted Key: y_2.id1
-> Index Scan using fract_t1_pkey on
fract_t1 y_2
Index Cond: (id1 = id1)
Filter: (id2 = id2)
(23 rows)
[...]
So that seems reasonable

Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't look like a valid plan to me. Sure all the nodes are arranged like I'd like them to be. But what are the id1/id2 bound we have in the index and filter conditions?

[...]but there's a couple issues too:

1) Planning works (hence we can run explain), but execution fails
because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
some reason. I haven't looked into the details, but I'd guess we need to
pass a different rel to create_incrementalsort_path, not childrel.

In case my above concern is valid, maybe the executor is just as confused as I am. Such conditions should generate VarSlots, no? If so, where should they come from?

Sadly I don't have time to debug that in depth today.

2) enable_partitionwisejoin=on seems to cause some confusion, because it
results in picking a different plan with higher cost. But that's clearly
not because of this patch.

Short version: I'd neither conceptually expect costs to be lower in any case, nor would that be desirable, because our estimates aren't perfect.

Long version: What do you mean by confusion. The plan I get with the patch doesn't seem to confusing to me. Generally something like that is to be expected. enable_partitionwisejoin changes the way this planing works by rewriting the entire query effectively rule based. So we end up with a completely different query. I'd in particular expect slightly different startup costs.
So if we activate this we consider completely different plans, I struggle to come up with a meaningful example where there is any overlap at all. Thus it doesn't surprise me conceptually.
From practical experience I'd say: If they are about the same plan, the costs estimates work somewhat okish.
If we change the way we compose the nodes together, we sometimes end up with radical different costs for doing the same. While I suspect there are a lot of corner cases causing this, I've seen quite a few where this was due to the fact, that our planer often has insignificant statistics to know something and takes a guess. This has gotten way better of recent years, but it's in particular for non-trivial joins still a problem in practice.

3) I'm still a bit skeptical about the cost of this implementation - it
builds the incrementalsort path, just to throw most of the paths away.
It'd be much better to just calculate the cost, which should be much
cheaper, and only do the heavy work for the one "best" path.

Maybe we should profile this to get a rough estimate, how much time we spend building these incremental paths. From a code perspective it's non trivial to me where the time is lost.

4) One thing I did not realize before is what pathkeys we even consider
here. Imagine you have two tables:

CREATE TABLE t1 (a int, b int, primary key (a));
CREATE TABLE t2 (a int, b int, primary key (a));

and query

SELECT * FROM t1 JOIN t2 USING (a,b);

It seems reasonable to also consider pathkeys (a,b) because that's make
e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
consider pathkeys that at least one child relation has, so in this case
it's just (a). Which means we'll never consider incremental sort for
this particular example.

It's a bit strange, because it's enough to create index on (a,b) for
just one of the relations, and it'll suddenly consider building
incremental sort on both sides.

I don't find that surprising, because the single index *might* make the incremental sort cheaper for the join *without* considering any external sort order.
So we would be switching up the incremental sort and the mergejoin, in case we need to sort anyways. That would mean considering also the sort order, that might be relevant on the outside. Sounds like an interesting idea for a later patch.

I don't plan to pursue this further at this point, so I pushed the first
part because that's a fairly simple improvement over what we have now.
But it seems like a fairly promising area for improvements.

I think 1) is pretty important, so we should sort that out sooner than later. Apart form that: :+1:
Thank you!

Regards
Arne

#17Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Arne Roland (#16)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

On 1/13/22 21:12, Arne Roland wrote:

 Hi!

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Subject: Re: PATCH: generate fractional cheapest paths in

generate_orderedappend_path

 
test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
ORDER BY id1 ASC, id2 ASC LIMIT 10;
                                    QUERY PLAN

------------------------------------------------------------------------------

   Limit
     ->  Merge Left Join
           Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
           ->  Append
                 ->  Index Only Scan using fract_t0_id1_id2_idx on
                                           fract_t0 x_1
                 ->  Incremental Sort
                       Sort Key: x_2.id1, x_2.id2
                       Presorted Key: x_2.id1
                       ->  Index Scan using fract_t1_pkey on fract_t1 x_2
           ->  Materialize
                 ->  Append
                       ->  Incremental Sort
                             Sort Key: y_1.id1, y_1.id2
                             Presorted Key: y_1.id1
                             ->  Index Scan using fract_t0_pkey on
                                                  fract_t0 y_1
                                   Index Cond: (id1 = id1)
                                   Filter: (id2 = id2)
                       ->  Incremental Sort
                             Sort Key: y_2.id1, y_2.id2
                             Presorted Key: y_2.id1
                             ->  Index Scan using fract_t1_pkey on
                                                  fract_t1 y_2
                                   Index Cond: (id1 = id1)
                                   Filter: (id2 = id2)
(23 rows)
[...]
So that seems reasonable

Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't
look like a valid plan to me. Sure all the nodes are arranged like I'd
like them to be. But what are the id1/id2 bound we have in the index and
filter conditions?

I'm not claiming the plan is 100% correct, and you may have a point
about the index condition / filter in the materialize branch.

But the overall plan shape (with incremental sort nodes on both sides)
seems reasonable to me. The materialize node is expected (incremental
sort does not support rescans cheaply).

Maybe that's not any cheaper than just doing merge join on the first
column, and filter on the second. But we should be able to decide that
based on cost, I think.

[...]but there's a couple issues too:

1) Planning works (hence we can run explain), but execution fails
because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
some reason. I haven't looked into the details, but I'd guess we need to
pass a different rel to create_incrementalsort_path, not childrel.

In case my above concern is valid, maybe the executor is just as
confused as I am. Such conditions should generate VarSlots, no? If so,
where should they come from?

Yeah, something like that.

Sadly I don't have time to debug that in depth today.

2) enable_partitionwisejoin=on seems to cause some confusion, because it
results in picking a different plan with higher cost. But that's clearly
not because of this patch.

Short version: I'd neither conceptually expect costs to be lower in any
case, nor would that be desirable, because our estimates aren't perfect.

Long version: What do you mean by confusion. The plan I get with the
patch doesn't seem to confusing to me. Generally something like that is
to be expected. enable_partitionwisejoin changes the way this planing
works by rewriting the entire query effectively rule based. So we end up
with a completely different query. I'd in particular expect slightly
different startup costs.
So if we activate this we consider completely different plans, I
struggle to come up with a meaningful example where there is any overlap
at all. Thus it doesn't surprise me conceptually.
From practical experience I'd say: If they are about the same plan, the
costs estimates work somewhat okish.
If we change the way we compose the nodes together, we sometimes end up
with radical different costs for doing the same. While I suspect there
are a lot of corner cases causing this, I've seen quite a few where this
was due to the fact, that our planer often has insignificant statistics
to know something and takes a guess. This has gotten way better of
recent years, but it's in particular for non-trivial joins still a
problem in practice.

By confusion I meant that if you plan the query with partitionwise join
enabled, you get a plan with cost X, and if you disable it you get a
different plan with cost Y, where (Y < X). Which is somewhat unexpected,
because that seems to simply reduce the set of plans we explore.

But as you say, enable_partitionwise_join kinda "switches" between two
planning modes. Not sure why we don't try building both paths and decide
based on costs.

3) I'm still a bit skeptical about the cost of this implementation - it
builds the incrementalsort path, just to throw most of the paths away.
It'd be much better to just calculate the cost, which should be much
cheaper, and only do the heavy work for the one "best" path.

Maybe we should profile this to get a rough estimate, how much time we
spend building these incremental paths. From a code perspective it's non
trivial to me where the time is lost.

TBH I haven't really done any profiling, but I wouldn't be surprised if
it got somewhat expensive for large number of child relations,
especially if there are a couple indexes on each. We do something
similar for nestloop (see initial_cost_nestloop).

4) One thing I did not realize before is what pathkeys we even consider
here. Imagine you have two tables:

     CREATE TABLE t1 (a int, b int, primary key (a));
     CREATE TABLE t2 (a int, b int, primary key (a));

and query

     SELECT * FROM t1 JOIN t2 USING (a,b);

It seems reasonable to also consider pathkeys (a,b) because that's make
e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
consider pathkeys that at least one child relation has, so in this case
it's just (a). Which means we'll never consider incremental sort for
this particular example.

It's a bit strange, because it's enough to create index on (a,b) for
just one of the relations, and it'll suddenly consider building
incremental sort on both sides.

I don't find that surprising, because the single index *might* make the
incremental sort cheaper for the join *without* considering any external
sort order.
So we would be switching up the incremental sort and the mergejoin, in
case we need to sort anyways. That would mean considering also the sort
order, that might be relevant on the outside. Sounds like an interesting
idea for a later patch.

I'm not sure it depends on the incremental sort. I suspect in some cases
it might be faster to fully sort the merge join inputs, even if none of
the input paths has suitable pathkeys.

For example, if you do

... FROM t1 JOIN t2 USING (a,b) ...

but there are only indexes on (a), maybe sorting on (a,b) would win e.g.
if there's a lot of duplicate values in (a)?

I was thinking about this variation on example from the committed patch:

CREATE TABLE fract_t (id1 BIGINT, id2 BIGINT)
PARTITION BY RANGE (id1);

CREATE TABLE fract_t0 PARTITION OF fract_t
FOR VALUES FROM ('0') TO ('10');

CREATE TABLE fract_t1 PARTITION OF fract_t
FOR VALUES FROM ('10') TO ('20');

CREATE INDEX ON fract_t(id1);

INSERT INTO fract_t (id1, id2)
SELECT i/100000, i FROM generate_series(0, 1999999) s(i);

ANALYZE fract_t;

-- not interested in nestloop/hashjoin paths for now
set enable_hashjoin = off;
set enable_nestloop = off;
set max_parallel_workers_per_gather = 0;

EXPLAIN (COSTS OFF)
SELECT * FROM fract_t x JOIN fract_t y USING (id1, id2) order by id1;

which is now planned like this:

QUERY PLAN
-----------------------------------------------------
Merge Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Sort
Sort Key: x.id1, x.id2
-> Append
-> Seq Scan on fract_t0 x_1
-> Seq Scan on fract_t1 x_2
-> Materialize
-> Sort
Sort Key: y.id1, y.id2
-> Append
-> Seq Scan on fract_t0 y_1
-> Seq Scan on fract_t1 y_2
(13 rows)

But maybe a plan like this might be better:

QUERY PLAN
-----------------------------------------------------
Merge Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Incremental Sort
Sort Key: x.id1, x.id2
Presorted Key: x.id1
-> Append
-> Index Scan on fract_t0 x_1
-> Index Scan on fract_t1 x_2
-> Materialize
-> Incremental Sort
Sort Key: y.id1, y.id2
Presorted Key: y.id1
-> Append
-> Index Scan on fract_t0 y_1
-> Index Scan on fract_t1 y_2

or maybe even Incremental Sort + (Merge) Append on top. Which is what I
was trying to achieve with the experimental patch.

FWIW I did briefly look at the Merge Join + Incremental Sort plan too,
and it seems we don't consider incremental sorts there either. AFAICS
add_paths_to_joinrel justs call sort_inner_and_outer, which determines
interesting pathkeys in select_outer_pathkeys_for_merge, and I guess
that only considers pathkeys usable for full sort. In any case, we don't
actually add sort paths - we construct sort plans by calling make_sort
in create_mergejoin_plan. So there's not much chance for incremental
sort at all. That's kinda unfortunate, I guess. It's one of the
non-pathified places that we ignored while adding incremental sort.

I don't plan to pursue this further at this point, so I pushed the first
part because that's a fairly simple improvement over what we have now.
But it seems like a fairly promising area for improvements.

I think 1) is pretty important, so we should sort that out sooner than
later. Apart form that: :+1:
Thank you!

I agree it's worth investigating and experimenting with. We may end up
realizing those plans are not worth it, but we won't know until we try.

It may require replacing some of the hard-coded logic in createplan.c
with constructing regular alternative paths. IIRC we even did some of
this work in the incremental sort patch at some point, but then ripped
that out to keep the patch smaller / simpler ... need to look at it again.

Are you interested / willing to do some of this work?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#17)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

Hi,

On 2022-01-14 01:39:30 +0100, Tomas Vondra wrote:

Are you interested / willing to do some of this work?

This patch hasn't moved in the last two months. I think it may be time to
mark it as returned with feedback for now?

It's also failing tests, and has done so for months:

https://cirrus-ci.com/task/5308087077699584
https://api.cirrus-ci.com/v1/artifact/task/5308087077699584/log/src/test/regress/regression.diffs

Greetings,

Andres Freund

#19Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Andres Freund (#18)
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

On 3/22/22 01:18, Andres Freund wrote:

Hi,

On 2022-01-14 01:39:30 +0100, Tomas Vondra wrote:

Are you interested / willing to do some of this work?

This patch hasn't moved in the last two months. I think it may be time to
mark it as returned with feedback for now?

It's also failing tests, and has done so for months:

https://cirrus-ci.com/task/5308087077699584
https://api.cirrus-ci.com/v1/artifact/task/5308087077699584/log/src/test/regress/regression.diffs

Greetings,

Yeah. I think it's a useful improvement, but it needs much more work
than is doable by the end of this CF. RwF seems about right.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company