An incorrect check in get_memoize_path
While reviewing another patch [1]/messages/by-id/60bf8d26-7c7e-4915-b544-afdb9020011d@gmail.com, I noticed an incorrect check in
get_memoize_path.
if (extra->inner_unique &&
(inner_path->param_info == NULL ||
bms_num_members(inner_path->param_info->ppi_serials) <
list_length(extra->restrictlist)))
return NULL;
This check is to ensure that, for unique joins, all the restriction
clauses are parameterized to enable the use of Memoize nodes.
However, ppi_clauses includes join clauses available from all outer
rels, not just the current outer rel, while extra->restrictlist only
includes the restriction clauses for the current join. This means the
check could pass even if a restriction clause isn't parameterized, as
long as another join clause, which doesn't belong to the current join,
is included in ppi_clauses. It's not very hard to construct a query
that exposes this issue.
create table t (a int, b int);
explain (costs off)
select * from t t0 left join
(t t1 left join t t2 on t1.a = t2.a
join lateral
(select distinct on (a, b) a, b, t0.a+t1.a+t2.a as x from t t3 offset 0) t3
on t1.a = t3.a and coalesce(t2.b) = t3.b)
on t0.b = t3.b;
QUERY PLAN
---------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on t t0
-> Nested Loop
Join Filter: (COALESCE(t2.b) = t3.b)
-> Hash Left Join
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
-> Subquery Scan on t3
Filter: ((t0.b = t3.b) AND (t1.a = t3.a))
-> Unique
-> Sort
Sort Key: t3_1.a, t3_1.b
-> Seq Scan on t t3_1
(15 rows)
Consider the join to t3. It is a unique join, and not all of its
restriction clauses are parameterized. Despite this, the check still
passes.
Interestingly, although the check passes, no Memoize nodes are added
on top of t3's paths because paraminfo_get_equal_hashops insists that
all ppi_clauses must satisfy clause_sides_match_join (though I
actually question whether this is necessary, but that's a separate
issue). However, this does not justify the check being correct.
Hence, I propose the attached patch for the fix.
BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter. I don't think this is possible in all cases. In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into. So I removed that
comment in the patch while we're here.
Any thoughts?
[1]: /messages/by-id/60bf8d26-7c7e-4915-b544-afdb9020011d@gmail.com
Thanks
Richard
Attachments:
v1-0001-Fix-an-incorrect-check-in-get_memoize_path.patchapplication/octet-stream; name=v1-0001-Fix-an-incorrect-check-in-get_memoize_path.patchDownload
From f1664f031f98ac20b4d810eeb960fd3ad7f3abad Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 7 Apr 2025 16:34:17 +0900
Subject: [PATCH v1] Fix an incorrect check in get_memoize_path
---
src/backend/optimizer/path/joinpath.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 18891ce9156..14ed9daeca8 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -748,16 +748,22 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
*
* Lateral vars needn't be considered here as they're not considered when
* determining if the join is unique.
- *
- * XXX this could be enabled if the remaining join quals were made part of
- * the inner scan's filter instead of the join filter. Maybe it's worth
- * considering doing that?
*/
- if (extra->inner_unique &&
- (inner_path->param_info == NULL ||
- bms_num_members(inner_path->param_info->ppi_serials) <
- list_length(extra->restrictlist)))
- return NULL;
+ if (extra->inner_unique)
+ {
+ Bitmapset *ppi_serials;
+
+ if (inner_path->param_info == NULL)
+ return NULL;
+
+ ppi_serials = inner_path->param_info->ppi_serials;
+
+ foreach_node(RestrictInfo, rinfo, extra->restrictlist)
+ {
+ if (!bms_is_member(rinfo->rinfo_serial, ppi_serials))
+ return NULL;
+ }
+ }
/*
* We can't use a memoize node if there are volatile functions in the
--
2.43.0
Hi Richard
Thank you work on this
- * - * XXX this could be enabled if the remaining join quals were made
part of - * the inner scan's filter instead of the join filter. Maybe it's
worth - * considering doing that? */ - if (extra->inner_unique && -
(inner_path->param_info == NULL || -
bms_num_members(inner_path->param_info->ppi_serials) < -
list_length(extra->restrictlist))) - return NULL; + if
(extra->inner_unique) + { + Bitmapset *ppi_serials; + + if
(inner_path->param_info == NULL) + return NULL; + + ppi_serials =
inner_path->param_info->ppi_serials; + + foreach_node(RestrictInfo, rinfo,
extra->restrictlist) + { + if (!bms_is_member(rinfo->rinfo_serial,
ppi_serials)) + return NULL; + } + } The refined version introduces a more
precise validation by:
1. Explicitly verifying each restriction – Instead of relying on a count
comparison, it checks whether every restriction's serial number is included
in the inner path’s parameter info (ppi_serials).
2. Reducing false negatives – The optimizer now only discards a path if a
restriction is definitively unhandled, allowing more valid plans to be
considered.
On Mon, Apr 7, 2025 at 3:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
While reviewing another patch [1], I noticed an incorrect check in
get_memoize_path.if (extra->inner_unique &&
(inner_path->param_info == NULL ||
bms_num_members(inner_path->param_info->ppi_serials) <
list_length(extra->restrictlist)))
return NULL;This check is to ensure that, for unique joins, all the restriction
clauses are parameterized to enable the use of Memoize nodes.
However, ppi_clauses includes join clauses available from all outer
rels, not just the current outer rel, while extra->restrictlist only
includes the restriction clauses for the current join. This means the
check could pass even if a restriction clause isn't parameterized, as
long as another join clause, which doesn't belong to the current join,
is included in ppi_clauses. It's not very hard to construct a query
that exposes this issue.create table t (a int, b int);
explain (costs off)
select * from t t0 left join
(t t1 left join t t2 on t1.a = t2.a
join lateral
(select distinct on (a, b) a, b, t0.a+t1.a+t2.a as x from t t3 offset
0) t3
on t1.a = t3.a and coalesce(t2.b) = t3.b)
on t0.b = t3.b;
QUERY PLAN
---------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on t t0
-> Nested Loop
Join Filter: (COALESCE(t2.b) = t3.b)
-> Hash Left Join
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
-> Subquery Scan on t3
Filter: ((t0.b = t3.b) AND (t1.a = t3.a))
-> Unique
-> Sort
Sort Key: t3_1.a, t3_1.b
-> Seq Scan on t t3_1
(15 rows)Consider the join to t3. It is a unique join, and not all of its
restriction clauses are parameterized. Despite this, the check still
passes.Interestingly, although the check passes, no Memoize nodes are added
on top of t3's paths because paraminfo_get_equal_hashops insists that
all ppi_clauses must satisfy clause_sides_match_join (though I
actually question whether this is necessary, but that's a separate
issue). However, this does not justify the check being correct.Hence, I propose the attached patch for the fix.
BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter. I don't think this is possible in all cases. In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into. So I removed that
comment in the patch while we're here.Any thoughts?
[1] /messages/by-id/60bf8d26-7c7e-4915-b544-afdb9020011d@gmail.com
Thanks
Richard
On 4/7/25 09:50, Richard Guo wrote:
Consider the join to t3. It is a unique join, and not all of its
restriction clauses are parameterized. Despite this, the check still
passes.
At least, this code looks more simple to understand, more 'armored' and
worth to change.
At the same time I think term 'Incorrect' is not good unless you show an
example where data returned is not consistent to the expected.
I think this inequality check has worked in couple with the
get_equal_hashops.
I think it may be pushed as is. But it worth to discover the
get_equal_hashops more deeply. Discovering your idea I wrote an example
(see attachment) which (with commented clause_sides_match_join) provides
an incorrect Memoize - that's why I ended up with support of your change
even when you can't show any buggy behaviour. But I've got an assertion
that is different from the error I expected to see (error on single_row
mode):
#0 0x00005583c45532ce in CheckVarSlotCompatibility
#1 0x00005583c4553244 in CheckExprStillValid
#2 0x00005583c45530ea in ExecInterpExprStillValid
#3 0x00005583c45a245c in ExecEvalExpr
#4 0x00005583c45a3985 in prepare_probe_slot
#5 0x00005583c45a3e52 in cache_lookup
#6 0x00005583c45a4313 in ExecMemoize
It looks strange.
--
regards, Andrei Lepikhov
Attachments:
On Mon, Apr 7, 2025 at 9:54 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 4/7/25 09:50, Richard Guo wrote:
Consider the join to t3. It is a unique join, and not all of its
restriction clauses are parameterized. Despite this, the check still
passes.
At the same time I think term 'Incorrect' is not good unless you show an
example where data returned is not consistent to the expected.
I think this inequality check has worked in couple with the
get_equal_hashops.
Do you mean this check is designed to work in conjunction with the
clause_sides_match_join check in paraminfo_get_equal_hashops? I would
consider this a very poor design. The purpose of this check is simply
to verify whether all restriction clauses are parameterized. I don't
understand why it needs to depend on paraminfo_get_equal_hashops in
such an unexpected way to function correctly. I can't see any
advantage to this design, other than that it's prone to bugs.
Moreover, in the case where not all restriction clauses are
parameterized, why waste CPU cycles running all the code after the
check only for paraminfo_get_equal_hashops to catch it later?
Additionally, I couldn't find any comments explaining this unusual
behavior, either in the check itself or in paraminfo_get_equal_hashops.
Thanks
Richard
On 4/8/25 08:32, Richard Guo wrote:
On Mon, Apr 7, 2025 at 9:54 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 4/7/25 09:50, Richard Guo wrote:
Consider the join to t3. It is a unique join, and not all of its
restriction clauses are parameterized. Despite this, the check still
passes.At the same time I think term 'Incorrect' is not good unless you show an
example where data returned is not consistent to the expected.
I think this inequality check has worked in couple with the
get_equal_hashops.Do you mean this check is designed to work in conjunction with the
clause_sides_match_join check in paraminfo_get_equal_hashops? I would
consider this a very poor design.
As I have written before, I am quite happy with the change you propose.
I just pointed out that term 'incorrect' usually means you may provide a
query which causes an error or inconsistent data which we can add to the
tests set. Current code may be described as 'kludge' lines - but I'm not
a native speaker, don't bikeshed here too much.
--
regards, Andrei Lepikhov
On Mon, Apr 7, 2025 at 4:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
Hence, I propose the attached patch for the fix.
BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter. I don't think this is possible in all cases. In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into. So I removed that
comment in the patch while we're here.Any thoughts?
Here is an updated patch with a commit message. Regarding
backporting, I'm inclined not to, given the lack of field reports.
Any objections to pushing it?
Thanks
Richard
Attachments:
v2-0001-Fix-an-incorrect-check-in-get_memoize_path.patchapplication/octet-stream; name=v2-0001-Fix-an-incorrect-check-in-get_memoize_path.patchDownload
From 7a4e7eec3061270942423713169dce070187548c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 7 Apr 2025 16:34:17 +0900
Subject: [PATCH v2] Fix an incorrect check in get_memoize_path
Memoize typically marks cache entries as complete after fully scanning
the inner side of a join. However, in the case of unique joins, we
skip to the next outer tuple as soon as the first matching inner tuple
is found, leaving no opportunity to scan the inner side to completion.
To work around that, we mark cache entries as complete after fetching
the first matching inner tuple in unique joins.
This approach is only safe when all of the join's restriction clauses
are parameterized; otherwise, there is no guarantee that reading just
one tuple from the inner side is sufficient.
Currently, we check for this by verifying that the number of clauses
in ppi_clauses is no less than the number of the join's restriction
clauses. However, this check isn't entirely reliable, as ppi_clauses
includes join clauses available from all outer rels, not just the
current outer rel. This means the check could pass even if a
restriction clause isn't parameterized, as long as another join
clause, which doesn't belong to the current join, is included in
ppi_clauses.
To fix this, we explicitly check whether each restriction clause of
the current join is present in ppi_clauses.
While we're here, remove the XXX comment from the modified code, as
it's not justified; in certain cases, it's not possible to move a join
clause to the inner side.
This is arguably a bugfix, but no backpatch given the lack of field
reports.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-8JPouj=wBDj4DhK-WO4+Xdx=A2jbjvvyyTBQneJ1=BQ@mail.gmail.com
---
src/backend/optimizer/path/joinpath.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 18891ce9156..14ed9daeca8 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -748,16 +748,22 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
*
* Lateral vars needn't be considered here as they're not considered when
* determining if the join is unique.
- *
- * XXX this could be enabled if the remaining join quals were made part of
- * the inner scan's filter instead of the join filter. Maybe it's worth
- * considering doing that?
*/
- if (extra->inner_unique &&
- (inner_path->param_info == NULL ||
- bms_num_members(inner_path->param_info->ppi_serials) <
- list_length(extra->restrictlist)))
- return NULL;
+ if (extra->inner_unique)
+ {
+ Bitmapset *ppi_serials;
+
+ if (inner_path->param_info == NULL)
+ return NULL;
+
+ ppi_serials = inner_path->param_info->ppi_serials;
+
+ foreach_node(RestrictInfo, rinfo, extra->restrictlist)
+ {
+ if (!bms_is_member(rinfo->rinfo_serial, ppi_serials))
+ return NULL;
+ }
+ }
/*
* We can't use a memoize node if there are volatile functions in the
--
2.43.0
On 4/14/25 08:49, Richard Guo wrote:
On Mon, Apr 7, 2025 at 4:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
Hence, I propose the attached patch for the fix.
BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter. I don't think this is possible in all cases. In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into. So I removed that
comment in the patch while we're here.Any thoughts?
Here is an updated patch with a commit message. Regarding
backporting, I'm inclined not to, given the lack of field reports.
Any objections to pushing it?
No objections.
By the way, I think you should be less shy about adding to CC people who
have been involved in the discussion (at least David should have a
chance to know). Also, don't hesitate to use the 'Reviewed-by' tag to
let people know who else may remember the context of the problem, just
in case.
--
regards, Andrei Lepikhov
HI
No objections.It's a pity that the postgresql18 version has been
code-frozen
Thanks
On Mon, Apr 14, 2025 at 4:21 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
Show quoted text
On 4/14/25 08:49, Richard Guo wrote:
On Mon, Apr 7, 2025 at 4:50 PM Richard Guo <guofenglinux@gmail.com>
wrote:
Hence, I propose the attached patch for the fix.
BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter. I don't think this is possible in all cases. In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into. So I removed that
comment in the patch while we're here.Any thoughts?
Here is an updated patch with a commit message. Regarding
backporting, I'm inclined not to, given the lack of field reports.
Any objections to pushing it?No objections.
By the way, I think you should be less shy about adding to CC people who
have been involved in the discussion (at least David should have a
chance to know). Also, don't hesitate to use the 'Reviewed-by' tag to
let people know who else may remember the context of the problem, just
in case.--
regards, Andrei Lepikhov
On Mon, Apr 14, 2025 at 8:08 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
No objections.It's a pity that the postgresql18 version has been code-frozen
v18 is now in feature freeze, but not code freeze, so bug fixes are
still allowed. I've pushed this patch after adding the "Reviewed-by"
tags.
Thanks
Richard