Fix bogus Asserts in calc_non_nestloop_required_outer
As stated in [1]/messages/by-id/CAMbWs4_UoVcCwkVMfi9TjSC=o5U6BRHUNZiVhrvSbDfU2HaeDA@mail.gmail.com, all paths arriving here are parameterized by top
parents, so we should check against top_parent_relids if it exists in
the two Asserts.
Attached is a patch fixing that.
[1]: /messages/by-id/CAMbWs4_UoVcCwkVMfi9TjSC=o5U6BRHUNZiVhrvSbDfU2HaeDA@mail.gmail.com
/messages/by-id/CAMbWs4_UoVcCwkVMfi9TjSC=o5U6BRHUNZiVhrvSbDfU2HaeDA@mail.gmail.com
Thanks
Richard
Attachments:
v1-0001-Fix-bogus-Asserts-in-calc_non_nestloop_required_outer.patchapplication/octet-stream; name=v1-0001-Fix-bogus-Asserts-in-calc_non_nestloop_required_outer.patchDownload
From 6c1ecb88fcd482b1fe0c805bd888f140967164d4 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 3 Aug 2023 10:18:53 +0800
Subject: [PATCH v1] Fix bogus Asserts in calc_non_nestloop_required_outer
---
src/backend/optimizer/util/pathnode.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f123fcb41e..062928288d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2391,8 +2391,14 @@ calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path)
Relids required_outer;
/* neither path can require rels from the other */
- Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
- Assert(!bms_overlap(inner_paramrels, outer_path->parent->relids));
+ Assert(!bms_overlap(outer_paramrels,
+ inner_path->parent->top_parent_relids ?
+ inner_path->parent->top_parent_relids :
+ inner_path->parent->relids));
+ Assert(!bms_overlap(inner_paramrels,
+ outer_path->parent->top_parent_relids ?
+ outer_path->parent->top_parent_relids :
+ outer_path->parent->relids));
/* form the union ... */
required_outer = bms_union(outer_paramrels, inner_paramrels);
/* we do not need an explicit test for empty; bms_union gets it right */
--
2.31.0
Hi Richard,
As stated in [1], all paths arriving here are parameterized by top
parents, so we should check against top_parent_relids if it exists in
the two Asserts.Attached is a patch fixing that.
Probably it's just because of my limited experience with the optimizer
but I don't find the proposed change particularly straightforward. I
would suggest adding a comment before the Assert's and/or a detailed
commit message would be helpful.
Other than that I can confirm that both branches in the Assert's are
executed and the tests pass in different test environments.
--
Best regards,
Aleksander Alekseev
On Thu, Sep 7, 2023 at 11:09 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:
Probably it's just because of my limited experience with the optimizer
but I don't find the proposed change particularly straightforward. I
would suggest adding a comment before the Assert's and/or a detailed
commit message would be helpful.
Comment is now added above the Asserts. Thanks for taking an interest
in this.
Thanks
Richard
Attachments:
v2-0001-Fix-bogus-Asserts-in-calc_non_nestloop_required_outer.patchapplication/octet-stream; name=v2-0001-Fix-bogus-Asserts-in-calc_non_nestloop_required_outer.patchDownload
From ccfeb7e12d4140e9c5f91741ba00f15fa9fffef1 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 3 Aug 2023 10:18:53 +0800
Subject: [PATCH v2] Fix bogus Asserts in calc_non_nestloop_required_outer
---
src/backend/optimizer/util/pathnode.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..3ddc867d06 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2396,9 +2396,20 @@ calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path)
Relids inner_paramrels = PATH_REQ_OUTER(inner_path);
Relids required_outer;
- /* neither path can require rels from the other */
- Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
- Assert(!bms_overlap(inner_paramrels, outer_path->parent->relids));
+ /*
+ * Neither path can require rels from the other.
+ *
+ * Note that paths are parameterized by top-level parents, so run
+ * parameterization tests on the parent relids.
+ */
+ Assert(!bms_overlap(outer_paramrels,
+ inner_path->parent->top_parent_relids ?
+ inner_path->parent->top_parent_relids :
+ inner_path->parent->relids));
+ Assert(!bms_overlap(inner_paramrels,
+ outer_path->parent->top_parent_relids ?
+ outer_path->parent->top_parent_relids :
+ outer_path->parent->relids));
/* form the union ... */
required_outer = bms_union(outer_paramrels, inner_paramrels);
/* we do not need an explicit test for empty; bms_union gets it right */
--
2.31.0
I have reviewed the changes and it looks fine.
The new status of this patch is: Ready for Committer
On Fri, Nov 3, 2023 at 2:47 AM Richard Guo <guofenglinux@gmail.com> wrote:
Comment is now added above the Asserts. Thanks for taking an interest
in this.
I'd like to suggest rewording this comment a little more. Here's my proposal:
Both of the paths passed to this function are still parameterized by
the topmost parent, because reparameterize_path_by_child() hasn't been
called yet. So, these sanity checks use the parent relids.
What do you think?
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Nov 3, 2023 at 2:47 AM Richard Guo <guofenglinux@gmail.com> wrote:
Comment is now added above the Asserts. Thanks for taking an interest
in this.
I'd like to suggest rewording this comment a little more. Here's my proposal:
Both of the paths passed to this function are still parameterized by
the topmost parent, because reparameterize_path_by_child() hasn't been
called yet. So, these sanity checks use the parent relids.
What do you think?
I don't think I believe this code change, let alone any of the
explanations for it. The point of these Asserts is to be sure that
we don't form an alleged parameterization set that includes any rels
that are included in the new join, because that would be obviously
wrong. They do check that as they stand. I'm not sure what invariant
the proposed revision would be enforcing.
There might be an argument for leaving the existing Asserts in
place and *also* checking
Assert(!bms_overlap(outer_paramrels,
inner_path->parent->top_parent_relids));
Assert(!bms_overlap(inner_paramrels,
outer_path->parent->top_parent_relids));
but I'm not quite convinced what the point of that is.
regards, tom lane
On Fri, Jan 5, 2024 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't think I believe this code change, let alone any of the
explanations for it. The point of these Asserts is to be sure that
we don't form an alleged parameterization set that includes any rels
that are included in the new join, because that would be obviously
wrong. They do check that as they stand. I'm not sure what invariant
the proposed revision would be enforcing.
Well, this explanation made sense to me:
/messages/by-id/CAMbWs4-+Gs0HJ9ouBUb=qwHsGCXxG+92eJzLOpCkedvgtOWQ=Q@mail.gmail.com
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 5, 2024 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't think I believe this code change, let alone any of the
explanations for it. The point of these Asserts is to be sure that
we don't form an alleged parameterization set that includes any rels
that are included in the new join, because that would be obviously
wrong. They do check that as they stand. I'm not sure what invariant
the proposed revision would be enforcing.
Well, this explanation made sense to me:
/messages/by-id/CAMbWs4-+Gs0HJ9ouBUb=qwHsGCXxG+92eJzLOpCkedvgtOWQ=Q@mail.gmail.com
The argument for the patch as proposed is that we should make the
mergejoin and hashjoin code paths do what the nestloop path is doing.
However, as I replied further down in that other thread, I'm not
exactly convinced that the nestloop path is doing the right thing.
I've not had time to look closer at that, though.
regards, tom lane
On Sat, Jan 6, 2024 at 4:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Well, this explanation made sense to me:
/messages/by-id/CAMbWs4-+Gs0HJ9ouBUb=qwHsGCXxG+92eJzLOpCkedvgtOWQ=Q@mail.gmail.comThe argument for the patch as proposed is that we should make the
mergejoin and hashjoin code paths do what the nestloop path is doing.
However, as I replied further down in that other thread, I'm not
exactly convinced that the nestloop path is doing the right thing.
I've not had time to look closer at that, though.
I don't really understand what you were saying in your response there,
or what you're saying here. It seems to me that if the path is
parameterized by top relids, and the assertion is verifying that a
certain set of non-toprelids i.e. otherrels isn't present, then
obviously the assertion is going to pass, but it's not checking what
it purports to be checking. The ostensible purpose of the assertion is
to check that neither side is parameterized by the other side, because
a non-nestloop can't satisfy a parameterization. But with the
assertion as currently formulated, it would pass even if one side
*were* parameterized by the other side, because outer_paramrels and
inner_paramrels would contain child relations, and the set that it was
being compared to would contain only top-parents, and so they wouldn't
overlap.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Jan 6, 2024 at 4:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The argument for the patch as proposed is that we should make the
mergejoin and hashjoin code paths do what the nestloop path is doing.
However, as I replied further down in that other thread, I'm not
exactly convinced that the nestloop path is doing the right thing.
I've not had time to look closer at that, though.
I don't really understand what you were saying in your response there,
or what you're saying here. It seems to me that if the path is
parameterized by top relids, and the assertion is verifying that a
certain set of non-toprelids i.e. otherrels isn't present, then
obviously the assertion is going to pass, but it's not checking what
it purports to be checking.
I think we're talking at cross-purposes. What I was wondering about
(at least further down in the thread) was whether we shouldn't be
checking *both* the "real" and the "parent" relids to make sure they
don't overlap the parameterization sets. But it's probably overkill.
After reading the code again I agree that the parent relids are more
useful to check.
However, I still don't like Richard's patch too much as-is, because
the Asserts are difficult to read/understand and even more difficult
to compare to the other code path. I think we want to keep the
nestloop and not-nestloop paths as visually similar as possible,
so I propose we do it more like the attached (where I also borrowed
some of your wording for the comments).
A variant of this could be to ifdef out all the added code in
non-Assert builds:
Relids outer_paramrels = PATH_REQ_OUTER(outer_path);
Relids inner_paramrels = PATH_REQ_OUTER(inner_path);
Relids required_outer;
#ifdef USE_ASSERT_CHECKING
Relids innerrelids;
Relids outerrelids;
/*
* Any parameterization of the input paths refers to topmost parents of
* the relevant relations, because reparameterize_path_by_child() hasn't
* been called yet. So we must consider topmost parents of the relations
* being joined, too, while checking for disallowed parameterization
* cases.
*/
if (inner_path->parent->top_parent_relids)
innerrelids = inner_path->parent->top_parent_relids;
else
innerrelids = inner_path->parent->relids;
if (outer_path->parent->top_parent_relids)
outerrelids = outer_path->parent->top_parent_relids;
else
outerrelids = outer_path->parent->relids;
/* neither path can require rels from the other */
Assert(!bms_overlap(outer_paramrels, innerrelids));
Assert(!bms_overlap(inner_paramrels, outerrelids));
#endif
/* form the union ... */
but I'm not sure that's better. Probably any reasonable compiler
will throw away the whole calculation upon noticing the outputs
are unused.
regards, tom lane
Attachments:
v3-0001-Fix-bogus-Asserts-in-calc_non_nestloop_required_outer.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-Fix-bogus-Asserts-in-calc_non_nestloop_required_out; name*1=er.patchDownload
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 3fd5a24fad..e9def9d540 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -730,8 +730,11 @@ try_nestloop_path(PlannerInfo *root,
return;
/*
- * Paths are parameterized by top-level parents, so run parameterization
- * tests on the parent relids.
+ * Any parameterization of the input paths refers to topmost parents of
+ * the relevant relations, because reparameterize_path_by_child() hasn't
+ * been called yet. So we must consider topmost parents of the relations
+ * being joined, too, while determining parameterization of the result and
+ * checking for disallowed parameterization cases.
*/
if (innerrel->top_parent_relids)
innerrelids = innerrel->top_parent_relids;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f7ac71087a..8dbf790e89 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2360,6 +2360,9 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
* calc_nestloop_required_outer
* Compute the required_outer set for a nestloop join path
*
+ * Note: when considering a child join, the inputs nonetheless use top-level
+ * parent relids
+ *
* Note: result must not share storage with either input
*/
Relids
@@ -2394,11 +2397,30 @@ calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path)
{
Relids outer_paramrels = PATH_REQ_OUTER(outer_path);
Relids inner_paramrels = PATH_REQ_OUTER(inner_path);
+ Relids innerrelids PG_USED_FOR_ASSERTS_ONLY;
+ Relids outerrelids PG_USED_FOR_ASSERTS_ONLY;
Relids required_outer;
+ /*
+ * Any parameterization of the input paths refers to topmost parents of
+ * the relevant relations, because reparameterize_path_by_child() hasn't
+ * been called yet. So we must consider topmost parents of the relations
+ * being joined, too, while checking for disallowed parameterization
+ * cases.
+ */
+ if (inner_path->parent->top_parent_relids)
+ innerrelids = inner_path->parent->top_parent_relids;
+ else
+ innerrelids = inner_path->parent->relids;
+
+ if (outer_path->parent->top_parent_relids)
+ outerrelids = outer_path->parent->top_parent_relids;
+ else
+ outerrelids = outer_path->parent->relids;
+
/* neither path can require rels from the other */
- Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
- Assert(!bms_overlap(inner_paramrels, outer_path->parent->relids));
+ Assert(!bms_overlap(outer_paramrels, innerrelids));
+ Assert(!bms_overlap(inner_paramrels, outerrelids));
/* form the union ... */
required_outer = bms_union(outer_paramrels, inner_paramrels);
/* we do not need an explicit test for empty; bms_union gets it right */
On Mon, Jan 8, 2024 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we're talking at cross-purposes. What I was wondering about
(at least further down in the thread) was whether we shouldn't be
checking *both* the "real" and the "parent" relids to make sure they
don't overlap the parameterization sets. But it's probably overkill.
After reading the code again I agree that the parent relids are more
useful to check.
Hmm, OK.
But we could also find some way to assert that the parameterization
sets contain only top-most rels. That would be strictly stronger than
asserting that they don't contain any non-topmost rels that exist on
the other side.
Trying to say that more clearly, I think the rules here are:
1. outer_paramrels can't contain ANY non-top rels
2. outer_paramrels can't contain top rels that appear on the other
side of the join
What you're proposing to assert here is (2) plus a weaker version of
(1), namely "no non-top rels that appear on the other side of the
join". That's not wrong, but I don't see why it's better than checking
only (2) or checking exactly (1) and (2).
However, I still don't like Richard's patch too much as-is, because
the Asserts are difficult to read/understand and even more difficult
to compare to the other code path. I think we want to keep the
nestloop and not-nestloop paths as visually similar as possible,
so I propose we do it more like the attached (where I also borrowed
some of your wording for the comments).
I don't understand what this is parallel to;
calc_nestloop_required_outer does no similar dance.
If we don't set top_relids when it's the same as relids, then I agree
that doing this dance is warranted, but otherwise it's just clutter.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Jan 8, 2024 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we're talking at cross-purposes. What I was wondering about
(at least further down in the thread) was whether we shouldn't be
checking *both* the "real" and the "parent" relids to make sure they
don't overlap the parameterization sets. But it's probably overkill.
But we could also find some way to assert that the parameterization
sets contain only top-most rels.
Hmm ... perhaps worth doing. I think bms_is_subset against
all_baserels would work.
However, I still don't like Richard's patch too much as-is, because
the Asserts are difficult to read/understand and even more difficult
to compare to the other code path. I think we want to keep the
nestloop and not-nestloop paths as visually similar as possible,
so I propose we do it more like the attached (where I also borrowed
some of your wording for the comments).
I don't understand what this is parallel to;
calc_nestloop_required_outer does no similar dance.
No, because the dance is done in its caller. The fact that these
two functions don't have more-similar argument lists is a bit of
a wart, perhaps.
regards, tom lane
I wrote:
Robert Haas <robertmhaas@gmail.com> writes:
But we could also find some way to assert that the parameterization
sets contain only top-most rels.
Hmm ... perhaps worth doing. I think bms_is_subset against
all_baserels would work.
[ tries that ... ] Nope, it needs to be all_query_rels. I plead
ENOCAFFEINE. However, while we could easily add
+ Assert(bms_is_subset(inner_paramrels, root->all_query_rels));
+ Assert(bms_is_subset(outer_paramrels, root->all_query_rels));
to try_nestloop_path, that doesn't work as-is in
calc_non_nestloop_required_outer because we don't pass "root" to it.
We could add that, or perhaps contemplate some more-extensive
rearrangement to make the division of labor between
calc_nestloop_required_outer and its caller more like that between
calc_non_nestloop_required_outer and its callers. But I'm feeling
kind of unenthused about that, because
(1) I don't see how to rearrange things without duplicating code:
try_nestloop_path needs access to some of these values, while moving
any work out of calc_non_nestloop_required_outer would mean two
copies in its two callers. So there are reasons why that's not
perfectly symmetric. (But I still maintain that we should try to
make the code look similar between these two code paths, when
considering both the calc_xxx functions and their callers.)
(2) joinpath.c already knows that parameterization sets mention
only top-level relids, specifically at the definition and uses of
PATH_PARAM_BY_PARENT, and I bet there are more dependencies elsewhere.
So I'm not sure about the value of asserting it only here.
In short I'm inclined to apply v3 as-is.
regards, tom lane