Why does create_gather_merge_plan need make_sort?

Started by James Colemanabout 5 years ago9 messages
#1James Coleman
jtc331@gmail.com

While looking at another issue I noticed that create_gather_merge_plan
calls make_sort if the subplan isn't sufficiently sorted. In all of
the cases I've seen where a gather merge path (not plan) is created
the input path is expected to be properly sorted, so I was wondering
if anyone happened to know what case is being handled by the make_sort
call. Removing it doesn't seem to break any tests.

Thanks,
James

#2Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: James Coleman (#1)
Re: Why does create_gather_merge_plan need make_sort?

On 11/20/20 11:24 PM, James Coleman wrote:

While looking at another issue I noticed that create_gather_merge_plan
calls make_sort if the subplan isn't sufficiently sorted. In all of
the cases I've seen where a gather merge path (not plan) is created
the input path is expected to be properly sorted, so I was wondering
if anyone happened to know what case is being handled by the make_sort
call. Removing it doesn't seem to break any tests.

Yeah, I think you're right this is dead code, essentially. We're only
ever calling create_gather_merge_path() with pathkeys matching the
subpath. And it's like that on REL_12_STABLE too, i.e. before the
incremental sort was introduced.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#2)
Re: Why does create_gather_merge_plan need make_sort?

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 11/20/20 11:24 PM, James Coleman wrote:

While looking at another issue I noticed that create_gather_merge_plan
calls make_sort if the subplan isn't sufficiently sorted. In all of
the cases I've seen where a gather merge path (not plan) is created
the input path is expected to be properly sorted, so I was wondering
if anyone happened to know what case is being handled by the make_sort
call. Removing it doesn't seem to break any tests.

Yeah, I think you're right this is dead code, essentially. We're only
ever calling create_gather_merge_path() with pathkeys matching the
subpath. And it's like that on REL_12_STABLE too, i.e. before the
incremental sort was introduced.

It's probably there by analogy to the other callers of
prepare_sort_from_pathkeys, which all do at least a conditional
make_sort(). I'd be inclined to leave it there; it's cheap insurance
against somebody weakening the existing behavior.

regards, tom lane

#4Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tom Lane (#3)
Re: Why does create_gather_merge_plan need make_sort?

On 11/22/20 10:31 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 11/20/20 11:24 PM, James Coleman wrote:

While looking at another issue I noticed that create_gather_merge_plan
calls make_sort if the subplan isn't sufficiently sorted. In all of
the cases I've seen where a gather merge path (not plan) is created
the input path is expected to be properly sorted, so I was wondering
if anyone happened to know what case is being handled by the make_sort
call. Removing it doesn't seem to break any tests.

Yeah, I think you're right this is dead code, essentially. We're only
ever calling create_gather_merge_path() with pathkeys matching the
subpath. And it's like that on REL_12_STABLE too, i.e. before the
incremental sort was introduced.

It's probably there by analogy to the other callers of
prepare_sort_from_pathkeys, which all do at least a conditional
make_sort(). I'd be inclined to leave it there; it's cheap insurance
against somebody weakening the existing behavior.

But how do we know it's safe to actually do the sort there, e.g. in
light of the volatility/parallel-safety issues discussed in other threads?

I agree the check may be useful, but maybe we should just do elog(ERROR)
instead.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5James Coleman
jtc331@gmail.com
In reply to: Tomas Vondra (#4)
Re: Why does create_gather_merge_plan need make_sort?

On Sun, Nov 22, 2020 at 5:07 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 11/22/20 10:31 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 11/20/20 11:24 PM, James Coleman wrote:

While looking at another issue I noticed that create_gather_merge_plan
calls make_sort if the subplan isn't sufficiently sorted. In all of
the cases I've seen where a gather merge path (not plan) is created
the input path is expected to be properly sorted, so I was wondering
if anyone happened to know what case is being handled by the make_sort
call. Removing it doesn't seem to break any tests.

Yeah, I think you're right this is dead code, essentially. We're only
ever calling create_gather_merge_path() with pathkeys matching the
subpath. And it's like that on REL_12_STABLE too, i.e. before the
incremental sort was introduced.

It's probably there by analogy to the other callers of
prepare_sort_from_pathkeys, which all do at least a conditional
make_sort(). I'd be inclined to leave it there; it's cheap insurance
against somebody weakening the existing behavior.

But how do we know it's safe to actually do the sort there, e.g. in
light of the volatility/parallel-safety issues discussed in other threads?

I agree the check may be useful, but maybe we should just do elog(ERROR)
instead.

That was my thought. I'm not a big fan of maintaining a "this might be
useful" path particularly when there isn't any case in the code or
tests at all that exercises it. In other words, not only is it not
useful [yet], but also we don't actually have any signal to know that
it works (or keeps working) -- whether through tests or production
use.

So I'm +1 on turning it into an ERROR log instead, since it seems to
me that encountering this case would almost certainly represent a bug
in path generation.

James

#6James Coleman
jtc331@gmail.com
In reply to: James Coleman (#5)
1 attachment(s)
Re: Why does create_gather_merge_plan need make_sort?

On Mon, Nov 23, 2020 at 8:19 AM James Coleman <jtc331@gmail.com> wrote:

On Sun, Nov 22, 2020 at 5:07 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 11/22/20 10:31 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 11/20/20 11:24 PM, James Coleman wrote:

While looking at another issue I noticed that create_gather_merge_plan
calls make_sort if the subplan isn't sufficiently sorted. In all of
the cases I've seen where a gather merge path (not plan) is created
the input path is expected to be properly sorted, so I was wondering
if anyone happened to know what case is being handled by the make_sort
call. Removing it doesn't seem to break any tests.

Yeah, I think you're right this is dead code, essentially. We're only
ever calling create_gather_merge_path() with pathkeys matching the
subpath. And it's like that on REL_12_STABLE too, i.e. before the
incremental sort was introduced.

It's probably there by analogy to the other callers of
prepare_sort_from_pathkeys, which all do at least a conditional
make_sort(). I'd be inclined to leave it there; it's cheap insurance
against somebody weakening the existing behavior.

But how do we know it's safe to actually do the sort there, e.g. in
light of the volatility/parallel-safety issues discussed in other threads?

I agree the check may be useful, but maybe we should just do elog(ERROR)
instead.

That was my thought. I'm not a big fan of maintaining a "this might be
useful" path particularly when there isn't any case in the code or
tests at all that exercises it. In other words, not only is it not
useful [yet], but also we don't actually have any signal to know that
it works (or keeps working) -- whether through tests or production
use.

So I'm +1 on turning it into an ERROR log instead, since it seems to
me that encountering this case would almost certainly represent a bug
in path generation.

Here's a patch to do that.

James

Attachments:

v1-0001-Error-if-gather-merge-paths-aren-t-sufficiently-s.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Error-if-gather-merge-paths-aren-t-sufficiently-s.patchDownload
From 798d520a30a44bc00f4f7bdb4a39e443212f0424 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Sun, 29 Nov 2020 09:38:59 -0500
Subject: [PATCH v1] Error if gather merge paths aren't sufficiently sorted

---
 src/backend/optimizer/plan/createplan.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 40abe6f9f6..dd58bc5688 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1793,13 +1793,14 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
 										 &gm_plan->nullsFirst);
 
 
-	/* Now, insert a Sort node if subplan isn't sufficiently ordered */
+	/*
+	 * All gather merge paths should have already guaranteed the necessary sort
+	 * order either by adding an explicit sort node or by using presorted input.
+	 * If not we have no guarantee that applying a sort at this point would be
+	 * parallel safe,
+	 */
 	if (!pathkeys_contained_in(pathkeys, best_path->subpath->pathkeys))
-		subplan = (Plan *) make_sort(subplan, gm_plan->numCols,
-									 gm_plan->sortColIdx,
-									 gm_plan->sortOperators,
-									 gm_plan->collations,
-									 gm_plan->nullsFirst);
+		elog(ERROR, "gather merge input not sufficiently sorted");
 
 	/* Now insert the subplan under GatherMerge. */
 	gm_plan->plan.lefttree = subplan;
-- 
2.17.1

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: James Coleman (#6)
Re: Why does create_gather_merge_plan need make_sort?

On 11/29/20 3:44 PM, James Coleman wrote:

On Mon, Nov 23, 2020 at 8:19 AM James Coleman <jtc331@gmail.com> wrote:

..

So I'm +1 on turning it into an ERROR log instead, since it seems to
me that encountering this case would almost certainly represent a bug
in path generation.

Here's a patch to do that.

Thanks. Isn't the comment incomplete? Should it mention just parallel
safety or also volatility?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8James Coleman
jtc331@gmail.com
In reply to: Tomas Vondra (#7)
1 attachment(s)
Re: Why does create_gather_merge_plan need make_sort?

On Sun, Nov 29, 2020 at 4:10 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 11/29/20 3:44 PM, James Coleman wrote:

On Mon, Nov 23, 2020 at 8:19 AM James Coleman <jtc331@gmail.com> wrote:

..

So I'm +1 on turning it into an ERROR log instead, since it seems to
me that encountering this case would almost certainly represent a bug
in path generation.

Here's a patch to do that.

Thanks. Isn't the comment incomplete? Should it mention just parallel
safety or also volatility?

Volatility makes it parallel unsafe, and I'm not sure I want to get
into listing every possible issue here, but in the interest of making
it more obviously representative of the kinds of issues we might
encounter, I've tweaked it to mention volatility also, as well as
refer to the issues as "examples" of such concerns.

James

Attachments:

v2-0001-Error-if-gather-merge-paths-aren-t-sufficiently-s.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Error-if-gather-merge-paths-aren-t-sufficiently-s.patchDownload
From b4b08b70d8961ea29587412e9a2ef4dd39111ff0 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Sun, 29 Nov 2020 09:38:59 -0500
Subject: [PATCH v2] Error if gather merge paths aren't sufficiently sorted

---
 src/backend/optimizer/plan/createplan.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 40abe6f9f6..5ecf9f4065 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1793,13 +1793,15 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
 										 &gm_plan->nullsFirst);
 
 
-	/* Now, insert a Sort node if subplan isn't sufficiently ordered */
+	/*
+	 * All gather merge paths should have already guaranteed the necessary sort
+	 * order either by adding an explicit sort node or by using presorted input.
+	 * We can't simply add a sort here on additional pathkeys, because we can't
+	 * guarantee the sort would be safe. For example, expressions may be
+	 * volatile or otherwise parallel unsafe.
+	 */
 	if (!pathkeys_contained_in(pathkeys, best_path->subpath->pathkeys))
-		subplan = (Plan *) make_sort(subplan, gm_plan->numCols,
-									 gm_plan->sortColIdx,
-									 gm_plan->sortOperators,
-									 gm_plan->collations,
-									 gm_plan->nullsFirst);
+		elog(ERROR, "gather merge input not sufficiently sorted");
 
 	/* Now insert the subplan under GatherMerge. */
 	gm_plan->plan.lefttree = subplan;
-- 
2.17.1

#9Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: James Coleman (#8)
Re: Why does create_gather_merge_plan need make_sort?

For the record, this got committed as 6bc27698324 in December, along
with the other incremental sort fixes and improvements. I forgot to mark
it as committed in the app, so I've done that now.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company