BUG #18305: Unexpected error: "WindowFunc not found in subplan target lists" triggered by subqueries
The following bug has been logged on the website:
Bug reference: 18305
Logged by: Zuming Jiang
Email address: zuming.jiang@inf.ethz.ch
PostgreSQL version: 16.1
Operating system: Ubuntu 20.04
Description:
My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected
error "ERROR: WindowFunc not found in subplan target lists".
--- Set up database ---
create table exeet_t3 (pkey int4);
create view exeet_t8 as
select
ntile(exeet_subq_0.c_0) over () as c_0
from
(select (select pkey from exeet_t3 order by pkey limit 1 offset 6) as
c_0) as exeet_subq_0;
The fuzzer generates a test case:
--- Test case ---
select
1 as c_1
from
exeet_t8 as exeet_ref_17
where exeet_ref_17.c_0 < 0;
--- Expected behavior ---
The test case should not trigger any error.
--- Actual behavior ---
The test case trigger an error:
ERROR: WindowFunc not found in subplan target lists
--- Postgres version ---
Github commit: b0f0a9432d0b6f53634a96715f2666f6d4ea25a1
Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu
9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
--- Platform information ---
Platform: Ubuntu 20.04
Kernel: Linux 5.4.0-147-generic
PG Bug reporting form <noreply@postgresql.org> writes:
My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected
error "ERROR: WindowFunc not found in subplan target lists".
create table exeet_t3 (pkey int4);
create view exeet_t8 as
select
ntile(exeet_subq_0.c_0) over () as c_0
from
(select (select pkey from exeet_t3 order by pkey limit 1 offset 6) as
c_0) as exeet_subq_0;
select
1 as c_1
from
exeet_t8 as exeet_ref_17
where exeet_ref_17.c_0 < 0;
ERROR: WindowFunc not found in subplan target lists
Thanks for the report! Bisecting shows that this broke at
456fa635a909ee36f73ca84d340521bd730f265f is the first bad commit
commit 456fa635a909ee36f73ca84d340521bd730f265f
Author: David Rowley <drowley@postgresql.org>
Date: Fri Jan 27 16:08:41 2023 +1300
Teach planner about more monotonic window functions
9d9c02ccd introduced runConditions for window functions to allow
monotonic window function evaluation to be made more efficient when the
window function value went beyond some value that it would never go back
from due to its monotonic nature. That commit added prosupport functions
to inform the planner that row_number(), rank(), dense_rank() and some
forms of count(*) were monotonic. Here we add support for ntile(),
cume_dist() and percent_rank().
So I'm betting there's something wrong with the ntile support
function, but I've not dug deeper.
regards, tom lane
On Mon, Jan 22, 2024 at 11:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected
error "ERROR: WindowFunc not found in subplan target lists".create table exeet_t3 (pkey int4);
create view exeet_t8 as
select
ntile(exeet_subq_0.c_0) over () as c_0
from
(select (select pkey from exeet_t3 order by pkey limit 1 offset 6) as
c_0) as exeet_subq_0;select
1 as c_1
from
exeet_t8 as exeet_ref_17
where exeet_ref_17.c_0 < 0;
ERROR: WindowFunc not found in subplan target listsThanks for the report! Bisecting shows that this broke at
456fa635a909ee36f73ca84d340521bd730f265f is the first bad commit
commit 456fa635a909ee36f73ca84d340521bd730f265f
Author: David Rowley <drowley@postgresql.org>
Date: Fri Jan 27 16:08:41 2023 +1300Teach planner about more monotonic window functions
9d9c02ccd introduced runConditions for window functions to allow
monotonic window function evaluation to be made more efficient when the
window function value went beyond some value that it would never go
back
from due to its monotonic nature. That commit added prosupport
functions
to inform the planner that row_number(), rank(), dense_rank() and some
forms of count(*) were monotonic. Here we add support for ntile(),
cume_dist() and percent_rank().So I'm betting there's something wrong with the ntile support
function, but I've not dug deeper.
Here is a simplified repro query:
select 1 from
(select ntile(s1.x) over () as c
from (select (select 1) as x) as s1) s
where s.c = 1;
ERROR: WindowFunc not found in subplan target lists
I think the problem is that in find_window_run_conditions() we fail to
check thoroughly whether the WindowFunc contains any potential subplan
nodes. We do have the check:
/* can't use it if there are subplans in the WindowFunc */
if (contain_subplans((Node *) wfunc))
return false;
... and the bug being reported here indicates that the current check is
insufficient, because a Var node inside the WindowFunc could potentially
belong to a subquery and reference a SubLink expression.
In the query above, the WindowFunc's argument, 's1.x', is actually an
EXPR_SUBLINK sublink. After we've pulled up subquery 's1', the
arguments of the two WindowFuncs (one in the target list of subquery
's', the other in the WindowClause's runCondition of subquery 's') would
be replaced with SubLink nodes. And then after we've converted SubLink
nodes to SubPlans, these two SubLink nodes would be replaced with
Params, but with different paramids.
Hmm, is there a way to detect in find_window_run_conditions() whether
the WindowFunc's argument references a SubLink node of a subquery that
hasn't been pulled up?
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
Hmm, is there a way to detect in find_window_run_conditions() whether
the WindowFunc's argument references a SubLink node of a subquery that
hasn't been pulled up?
contain_subplans will recognize both SubLinks and SubPlans,
so there must be some other factor here. (I didn't look yet.)
regards, tom lane
Richard Guo <guofenglinux@gmail.com> writes:
I think the problem is that in find_window_run_conditions() we fail to
check thoroughly whether the WindowFunc contains any potential subplan
nodes. We do have the check:
/* can't use it if there are subplans in the WindowFunc */
if (contain_subplans((Node *) wfunc))
return false;
... and the bug being reported here indicates that the current check is
insufficient, because a Var node inside the WindowFunc could potentially
belong to a subquery and reference a SubLink expression.
I dug through this, and my conclusion is that the entire "run
condition" optimization is seriously broken, because the tree
manipulations here are many bricks shy of a load. There are
two big problems:
1. The subquery hasn't yet been through any preprocessing. Thus,
we've not yet done subquery pullup within it, which is why the
contain_subplans call fails to detect a problem: the WindowFunc's arg
is just a Var. Later, pullup of the lower subquery will replace that
Var with a SubLink, which eventually gets replaced with a Param
referencing an initplan's output. However, run condition optimization
caused us to put a second copy of the WindowFunc into the
runCondition, and that copy gets its own copy of the SubLink, which
eventually results in a different Param, leading to the observed
failure. (BTW, it scares the heck out of me that we just link the
WindowFunc expr into wclause->runCondition without making a copy of
it. That could be safe if the planner never scribbles on its input,
but that ain't so.)
2. We are pushing the "otherexpr" from the upper-level query
(which *has* been through preprocessing) into the unprocessed
child query. This makes me acutely uncomfortable:
A. Almost certainly, this fails if otherexpr contains a subplan.
B. Since nothing is done about levelsup adjustment, it will
fail if anything in that tree includes a levelsup field.
Fortunately is_pseudo_constant_clause will reject Vars,
but those aren't the only things with levelsup. I wonder
if it's possible to get an uplevel Aggref into a subquery's
restrictlist, in particular.
C. Again, failure to make a copy of the expression tree seems
pretty dangerous.
D. We're relying on repeat preprocessing of the otherexpr to
do no harm. Maybe that's OK, but it's not great.
The first three things are not hard to fix, but they're not being
taken care of today.
We could work around point 1 by refusing to pull up subqueries that
contain sublinks in their targetlists, but that would be a pretty big
change (and, probably, a pessimization of some queries). I do not
consider run-condition optimization to justify that. Moreover
I'm not sure that sublinks are the only thing that could get
mutated to a different state in the runCondition than in the main
tree.
I think the only real way to prevent problems from point 1 is to stop
making a copy of the WindowFunc expr. We need some other way to refer
to the WindowFunc's value in the runCondition tree. Maybe a generated
Param would serve?
regards, tom lane
On Wed, 24 Jan 2024 at 08:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We could work around point 1 by refusing to pull up subqueries that
contain sublinks in their targetlists, but that would be a pretty big
change (and, probably, a pessimization of some queries). I do not
consider run-condition optimization to justify that. Moreover
I'm not sure that sublinks are the only thing that could get
mutated to a different state in the runCondition than in the main
tree.I think the only real way to prevent problems from point 1 is to stop
making a copy of the WindowFunc expr. We need some other way to refer
to the WindowFunc's value in the runCondition tree. Maybe a generated
Param would serve?
I'm wondering if it was wrong to put the runCondition field in
WindowClause. Maybe it should be in WindowFunc instead...
Really the OpExpr that's created in find_window_run_conditions() with
the WindowFunc as an arg is there to show the Run Condition in
EXPLAIN. This is what's later stored in WindowAgg.runConditionOrig.
What we pass to ExecQual in nodeWindowAgg.c is a version of the OpExpr
with the WindowFunc replaced with a Var so that ExecQual properly
fetches the just-calculated WindowFunc return value from the slot. We
don't want to leave the WindowFunc in the runCondition as ExecQual
would go and evaluate it.
If WindowFunc allowed a list of a new struct called WindowRunCondition
with fields "otherarg", "opno", "collation", "wfunc_left" then we
could construct the OpExpr later either in createplan.c or setrefs.c.
The EXPLAIN version of that OpExpr could have the WindowFunc and the
non-EXPLAIN version would have the Var.
Sounds a bit invasive for back branches, but wondering if we couldn't
just modify window_ntile_support() to reject any ntile args other than
Consts. count(*), row_number(), rank(), dense_rank(), percent_rank()
and percent_rank() all can't suffer from this issue as they don't have
an argument. count(expr) would need to have something done to stop
the same issue from occurring. Maybe int8inc_support() could just set
req->monotonic = MONOTONICFUNC_NONE if the req->window_func has an
arg, effectively disabling the optimisation for count(expr).
David
On Thu, 25 Jan 2024 at 18:14, David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 24 Jan 2024 at 08:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We could work around point 1 by refusing to pull up subqueries that
contain sublinks in their targetlists, but that would be a pretty big
change (and, probably, a pessimization of some queries). I do not
consider run-condition optimization to justify that. Moreover
I'm not sure that sublinks are the only thing that could get
mutated to a different state in the runCondition than in the main
tree.I think the only real way to prevent problems from point 1 is to stop
making a copy of the WindowFunc expr. We need some other way to refer
to the WindowFunc's value in the runCondition tree. Maybe a generated
Param would serve?I'm wondering if it was wrong to put the runCondition field in
WindowClause. Maybe it should be in WindowFunc instead...Really the OpExpr that's created in find_window_run_conditions() with
the WindowFunc as an arg is there to show the Run Condition in
EXPLAIN. This is what's later stored in WindowAgg.runConditionOrig.
What we pass to ExecQual in nodeWindowAgg.c is a version of the OpExpr
with the WindowFunc replaced with a Var so that ExecQual properly
fetches the just-calculated WindowFunc return value from the slot. We
don't want to leave the WindowFunc in the runCondition as ExecQual
would go and evaluate it.If WindowFunc allowed a list of a new struct called WindowRunCondition
with fields "otherarg", "opno", "collation", "wfunc_left" then we
could construct the OpExpr later either in createplan.c or setrefs.c.
The EXPLAIN version of that OpExpr could have the WindowFunc and the
non-EXPLAIN version would have the Var.
Just to assist the discussion here I've drafted a patch along the
lines of the above. See attached
If you think this idea has merit I can try and turn it into something
committable for master.
Sounds a bit invasive for back branches, but wondering if we couldn't
just modify window_ntile_support() to reject any ntile args other than
Consts. count(*), row_number(), rank(), dense_rank(), percent_rank()
and percent_rank() all can't suffer from this issue as they don't have
an argument. count(expr) would need to have something done to stop
the same issue from occurring. Maybe int8inc_support() could just set
req->monotonic = MONOTONICFUNC_NONE if the req->window_func has an
arg, effectively disabling the optimisation for count(expr).
I'm still unsure about the fix for back branches but I'm open to other ideas.
David
Attachments:
poc_runcondition_fix_draft.patchtext/plain; charset=US-ASCII; name=poc_runcondition_fix_draft.patchDownload+105-52
On Fri, Jan 26, 2024 at 8:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 25 Jan 2024 at 18:14, David Rowley <dgrowleyml@gmail.com> wrote:
If WindowFunc allowed a list of a new struct called WindowRunCondition
with fields "otherarg", "opno", "collation", "wfunc_left" then we
could construct the OpExpr later either in createplan.c or setrefs.c.
The EXPLAIN version of that OpExpr could have the WindowFunc and the
non-EXPLAIN version would have the Var.Just to assist the discussion here I've drafted a patch along the
lines of the above. See attachedIf you think this idea has merit I can try and turn it into something
committable for master.
This idea seems reasonable to me. Now the runCondition is constructed
in create_one_window_path(), where the subquery has been through
preprocessing and therefore the WindowFunc's arg has been replaced with
a Param due to the pullup of the lower subquery and the expansion of
SubLinks to SubPlans. This fixes the problem reported here.
Thanks
Richard
On Thu, Jan 25, 2024 at 1:14 PM David Rowley <dgrowleyml@gmail.com> wrote:
Sounds a bit invasive for back branches, but wondering if we couldn't
just modify window_ntile_support() to reject any ntile args other than
Consts. count(*), row_number(), rank(), dense_rank(), percent_rank()
and percent_rank() all can't suffer from this issue as they don't have
an argument. count(expr) would need to have something done to stop
the same issue from occurring. Maybe int8inc_support() could just set
req->monotonic = MONOTONICFUNC_NONE if the req->window_func has an
arg, effectively disabling the optimisation for count(expr).
You're right that count(expr) also suffers from this issue.
select 1 from
(select count(s1.x) over () as c
from (select (select 1) as x) as s1) s
where s.c = 1;
ERROR: WindowFunc not found in subplan target lists
For back branches, the idea of modifying window_ntile_support() and
int8inc_support() to reject any non-pseudoconstant args also seems
reasonable to me. One thing I noticed is that sometimes it's not easy
to tell whether the arg is pseudoconstant or not in the support
functions, because a pseudoconstant is not necessarily being type of
Const. For instance, count(1::text) is a CoerceViaIO, and
ntile(1.0::int) is a FuncExpr. But these are very corner cases and I
think we can just ignore them.
Thanks
Richard
On Mon, 1 Apr 2024 at 21:34, akuluasan <akuluasan@163.com> wrote:
I am using my tool to simplify the SQL query. Can you please confirm if the simplification process helps you diagnose and locate bugs?
The root cause of this has already been established, so in this case,
the answer is "no".
However, since you wrote "bugs" rather than "this bug", for cases
where we've not found the root cause, the more simple the reproducer,
the better.
David
Import Notes
Reply to msg id not found: 9b6be25.1947.18e98ca5d19.Coremail.akuluasan@163.com
On Fri, 22 Mar 2024 at 23:47, Richard Guo <guofenglinux@gmail.com> wrote:
For back branches, the idea of modifying window_ntile_support() and
int8inc_support() to reject any non-pseudoconstant args also seems
reasonable to me. One thing I noticed is that sometimes it's not easy
to tell whether the arg is pseudoconstant or not in the support
functions, because a pseudoconstant is not necessarily being type of
Const. For instance, count(1::text) is a CoerceViaIO, and
ntile(1.0::int) is a FuncExpr. But these are very corner cases and I
think we can just ignore them.
I don't think that needs anything special aside from constant folding.
I've attached a more complete version of the patch (0002) and another
patch which is what I'd proposed as a fix for the backbranches (0001).
Note quite a few tests needed to be adjusted because of disabling this
optimisation.
The 0002 patch will require a cat version bump as it adds a field to
WindowFunc. Ideally, I'd be applying this fix to master, but I
imagine some people might feel we should delay applying a fix like
this until after we branch for v18. Happy to hear people's views on
that.
David
Attachments:
v2-0001-Disable-run-conditions-for-ntile-var-and-count-va.patchtext/plain; charset=US-ASCII; name=v2-0001-Disable-run-conditions-for-ntile-var-and-count-va.patchDownload+105-57
v2-0002-Re-instate-ntile-var-and-count-var-run-condition-.patchtext/plain; charset=US-ASCII; name=v2-0002-Re-instate-ntile-var-and-count-var-run-condition-.patchDownload+207-148
David Rowley <dgrowleyml@gmail.com> writes:
I've attached a more complete version of the patch (0002) and another
patch which is what I'd proposed as a fix for the backbranches (0001).
Note quite a few tests needed to be adjusted because of disabling this
optimisation.
IIUC, 0002 is meant to be applied on top of 0001, but it reverses
quite a lot of 0001? Please don't commit it like that, it'll just
add a lot of useless thrashing to "git blame" results.
It'd be easier to review this if you presented it as two independent
patches, one for HEAD and one for the back branches.
The fact that you had to use a cheesy "eval_const_expressions(NULL,
..." call in 0001 demonstrates that it was a mistake to not include
PlannerInfo in SupportRequestWFuncMonotonic, as every other planner-
invoked support request has. I realize that we can't change that
in back branches, and that it's no longer immediately necessary
in HEAD either after 0002. But let's learn from experience and
add it to the struct while we're here.
The 0002 patch will require a cat version bump as it adds a field to
WindowFunc. Ideally, I'd be applying this fix to master, but I
imagine some people might feel we should delay applying a fix like
this until after we branch for v18. Happy to hear people's views on
that.
catversion bumps are not a problem at this stage --- we will surely
do at least one more for assigned-OID renumbering. I'd rather avoid
the git history thrashing implicit in treating 0002 as a follow-on
patch.
regards, tom lane
On Fri, 26 Apr 2024 at 03:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It'd be easier to review this if you presented it as two independent
patches, one for HEAD and one for the back branches.
The attached v2 is the same patch as earlier and is intended for <= v16.
The fact that you had to use a cheesy "eval_const_expressions(NULL,
..." call in 0001 demonstrates that it was a mistake to not include
PlannerInfo in SupportRequestWFuncMonotonic, as every other planner-
invoked support request has. I realize that we can't change that
in back branches, and that it's no longer immediately necessary
in HEAD either after 0002. But let's learn from experience and
add it to the struct while we're here.
The correct PlannerInfo to set here would be the one that the
WindowFunc belongs to. The problem is that this code is called from
set_subquery_pathlist before the rel->subroot = subquery_planner. i.e
we've no PlannerInfo to set. Maybe it's worth doing this for
SupportRequestOptimizeWindowClause as a separate patch.
catversion bumps are not a problem at this stage
The attached v3 is a separate patch for v17 only.
David
Attachments:
v3-0001-Fix-query-pullup-issue-with-WindowClause-runCondi.patchtext/plain; charset=US-ASCII; name=v3-0001-Fix-query-pullup-issue-with-WindowClause-runCondi.patchDownload+142-53
v2-0001-Disable-run-conditions-for-ntile-var-and-count-va.patchtext/plain; charset=US-ASCII; name=v2-0001-Disable-run-conditions-for-ntile-var-and-count-va.patchDownload+105-57
On Tue, 30 Apr 2024 at 09:39, David Rowley <dgrowleyml@gmail.com> wrote:
The attached v2 is the same patch as earlier and is intended for <= v16.
I want to commit this one so it makes it into the next minor version
releases. It's not very complex, so plan to push it later today.
I'll hold off with the v17 version for a while.
David