BUG #17088: FailedAssertion in prepagg.c

Started by PG Bug reporting formalmost 5 years ago12 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17088
Logged by: yaoguang chen
Email address: cyg0810@gmail.com
PostgreSQL version: 14beta2
Operating system: Linux supersix 5.4.0-39-generic #43-Ubuntu SMP Fri
Description:

run the following sql command through client and the PostgreSQL database
process will crash:

CREATE TEMP TABLE v0 ( v1 INT PRIMARY KEY ) ON COMMIT DELETE ROWS ;
SELECT FROM ( VALUES ( ( SELECT - - 84 FROM v0 LIMIT - -1 ) ) ) v1 ( v1 )
GROUP BY ROLLUP ( v1 , v1 ) , ROLLUP ( ROW ( ) , ROW ( - - - - -128 ,
6099928.000000 ) , v1 ) ORDER BY v1 = v1 AND v1 = - - ( SELECT GROUPING ( v1
) GROUP BY v1 ) ASC FETCH FIRST ROWS WITH TIES

crash log:

HINT: Future log output will go to log destination "csvlog".
TRAP: FailedAssertion("!IsA(node, SubLink)", File:
"/home/supersix/fuzz/Squirrel/PostgreSQL/postgres/build/../src/backend/optimizer/prep/prepagg.c",
Line: 341, PID: 2310031)
postgres: supersix x 127.0.0.1(65126)
SELECT(ExceptionalCondition+0xbb)[0x562398112ffb]
postgres: supersix x 127.0.0.1(65126)
SELECT(preprocess_aggrefs+0x0)[0x562397d2da10]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x641)[0x562397c8cd01]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x7ef)[0x562397c8ceaf]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x159)[0x562397c8c819]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x159)[0x562397c8c819]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x159)[0x562397c8c819]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x832)[0x562397c8cef2]
postgres: supersix x 127.0.0.1(65126)
SELECT(expression_tree_walker+0x95)[0x562397c8c755]
postgres: supersix x 127.0.0.1(65126) SELECT(+0x592dd9)[0x562397d1bdd9]
postgres: supersix x 127.0.0.1(65126)
SELECT(subquery_planner+0xf63)[0x562397d1e8e3]
postgres: supersix x 127.0.0.1(65126)
SELECT(standard_planner+0x165)[0x562397d1f535]
postgres: supersix x 127.0.0.1(65126)
SELECT(pg_plan_query+0x6a)[0x562397ebceaa]
postgres: supersix x 127.0.0.1(65126)
SELECT(pg_plan_queries+0x4d)[0x562397ebcffd]
postgres: supersix x 127.0.0.1(65126) SELECT(+0x7359f2)[0x562397ebe9f2]
postgres: supersix x 127.0.0.1(65126)
SELECT(PostgresMain+0x1ae7)[0x562397ec0d57]
postgres: supersix x 127.0.0.1(65126) SELECT(+0x61671f)[0x562397d9f71f]
postgres: supersix x 127.0.0.1(65126)
SELECT(PostmasterMain+0x1182)[0x562397da2672]
postgres: supersix x 127.0.0.1(65126) SELECT(main+0x533)[0x562397852133]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f6aa088f0b3]
postgres: supersix x 127.0.0.1(65126) SELECT(_start+0x2e)[0x56239785228e]

#2Michael Paquier
michael@paquier.xyz
In reply to: PG Bug reporting form (#1)
Re: BUG #17088: FailedAssertion in prepagg.c

On Wed, Jul 07, 2021 at 06:33:07AM +0000, PG Bug reporting form wrote:

run the following sql command through client and the PostgreSQL database
process will crash:

CREATE TEMP TABLE v0 ( v1 INT PRIMARY KEY ) ON COMMIT DELETE ROWS ;
SELECT FROM ( VALUES ( ( SELECT - - 84 FROM v0 LIMIT - -1 ) ) ) v1 ( v1 )
GROUP BY ROLLUP ( v1 , v1 ) , ROLLUP ( ROW ( ) , ROW ( - - - - -128 ,
6099928.000000 ) , v1 ) ORDER BY v1 = v1 AND v1 = - - ( SELECT GROUPING ( v1
) GROUP BY v1 ) ASC FETCH FIRST ROWS WITH TIES

Reproduced here, thanks for the test case. As far as I can see, this
is not limited to 14.
--
Michael

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#2)
Re: BUG #17088: FailedAssertion in prepagg.c

Michael Paquier <michael@paquier.xyz> writes:

CREATE TEMP TABLE v0 ( v1 INT PRIMARY KEY ) ON COMMIT DELETE ROWS ;
SELECT FROM ( VALUES ( ( SELECT - - 84 FROM v0 LIMIT - -1 ) ) ) v1 ( v1 )
GROUP BY ROLLUP ( v1 , v1 ) , ROLLUP ( ROW ( ) , ROW ( - - - - -128 ,
6099928.000000 ) , v1 ) ORDER BY v1 = v1 AND v1 = - - ( SELECT GROUPING ( v1
) GROUP BY v1 ) ASC FETCH FIRST ROWS WITH TIES

Reproduced here, thanks for the test case. As far as I can see, this
is not limited to 14.

Yeah, this looks like it probably dates back to the addition of
GroupingFunc. The test case can be simplified a good deal:

SELECT (SELECT GROUPING(v1)) FROM (VALUES ((SELECT 1))) v(v1) GROUP BY cube(v1);
server closed the connection unexpectedly

I also found a probably-related variant:

SELECT (SELECT GROUPING(v1)) FROM (VALUES ((SELECT 1))) v(v1) GROUP BY v1;
ERROR: plan should not reference subplan's variable

These cases don't fail if the GROUPING call isn't inside a sub-select.

The proximate cause of the assertion failure is that preprocess_aggrefs
isn't expecting to find a SubLink, which is reasonable since we should
have removed them already. However, what it's actually seeing is

{TARGETENTRY
:expr
{SUBPLAN
...
:args (
{GROUPINGFUNC
:args (
{PLACEHOLDERVAR
:phexpr
{SUBLINK
...
}
...

If we don't put GROUPING(v1) inside a sub-SELECT, it looks like

{GROUPINGFUNC
:args (
{PLACEHOLDERVAR
:phexpr
{PARAM
:paramkind 1
:paramid 0
:paramtype 23
:paramtypmod -1
:paramcollid 0
:location -1
}
:phrels (b 2)
:phid 1
:phlevelsup 0
}
)
:refs (i 1)
:cols <>
:agglevelsup 0
:location 15
}

which seems a whole lot saner. So I surmise that somebody is
missing doing something relevant to the "args" list of a SubPlan.

An alternative theory is that we should never have done anything
at all to the argument tree of a GroupingFunc. Since it's not
supposed to be evaluated, treating it as a target for expression
preprocessing might be a mistake altogether. I wonder why its
arguments aren't stored as sortgroupref indexes or the like.

regards, tom lane

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#3)
Re: BUG #17088: FailedAssertion in prepagg.c

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> An alternative theory is that we should never have done anything
Tom> at all to the argument tree of a GroupingFunc. Since it's not
Tom> supposed to be evaluated, treating it as a target for expression
Tom> preprocessing might be a mistake altogether. I wonder why its
Tom> arguments aren't stored as sortgroupref indexes or the like.

The arguments are stored as sortgrouprefs (the "refs" list) for actual
use; the "args" list is only kept for the benefit of EXPLAIN and
deparsing.

A number of places in the planner have to explicitly avoid recursing
into GroupingFunc->args when walking trees specifically because they are
not evaluated. It looks to me like some places where that should have
been checked for were missed. Looking into it.

--
Andrew (irc:RhodiumToad)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#4)
Re: BUG #17088: FailedAssertion in prepagg.c

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

A number of places in the planner have to explicitly avoid recursing
into GroupingFunc->args when walking trees specifically because they are
not evaluated. It looks to me like some places where that should have
been checked for were missed. Looking into it.

Hmm. Maybe it'd be better if the default behavior in
expression_tree_walker/mutator did not include recursing into the args,
then? I am thinking this might be comparable to SubLinks/SubPlans, where
the walker has to take explicit action if it wants to recurse into the
sub-query.

regards, tom lane

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#5)
Re: BUG #17088: FailedAssertion in prepagg.c

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

A number of places in the planner have to explicitly avoid recursing
into GroupingFunc->args when walking trees specifically because they
are not evaluated. It looks to me like some places where that should
have been checked for were missed. Looking into it.

Tom> Hmm. Maybe it'd be better if the default behavior in
Tom> expression_tree_walker/mutator did not include recursing into the
Tom> args, then?

You'd think, but as I recall (I will re-check this to confirm) there
were more places where we _did_ need to recurse (especially during parse
analysis before we've matched up the sortgrouprefs), while most of the
places where recursion needed to be explicitly avoided already needed
special-case handling, so having the default the other way would likely
have required a special-case almost everywhere.

--
Andrew.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#6)
Re: BUG #17088: FailedAssertion in prepagg.c

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Hmm. Maybe it'd be better if the default behavior in
Tom> expression_tree_walker/mutator did not include recursing into the
Tom> args, then?

You'd think, but as I recall (I will re-check this to confirm) there
were more places where we _did_ need to recurse (especially during parse
analysis before we've matched up the sortgrouprefs), while most of the
places where recursion needed to be explicitly avoided already needed
special-case handling, so having the default the other way would likely
have required a special-case almost everywhere.

Fair enough. This is the kind of design choice that can be worth
revisiting later; but if the conclusion is still the same, fine with me.

regards, tom lane

#8Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#7)
Re: BUG #17088: FailedAssertion in prepagg.c

On Thu, Jul 8, 2021 at 5:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Hmm. Maybe it'd be better if the default behavior in
Tom> expression_tree_walker/mutator did not include recursing into the
Tom> args, then?

You'd think, but as I recall (I will re-check this to confirm) there
were more places where we _did_ need to recurse (especially during parse
analysis before we've matched up the sortgrouprefs), while most of the
places where recursion needed to be explicitly avoided already needed
special-case handling, so having the default the other way would likely
have required a special-case almost everywhere.

Fair enough. This is the kind of design choice that can be worth
revisiting later; but if the conclusion is still the same, fine with me.

I think the culprit is that when replacing correlation uplevel vars with
Params, we do not handle the SubLinks in the arguments of uplevel
GroupingFunc. We expect build_subplan should take care of it. But in
build_subplan, we ignore GroupingFunc incorrectly.

diff --git a/src/backend/optimizer/plan/subselect.c
b/src/backend/optimizer/plan/subselect.c
index 0881a208ac..e4918f275e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -364,7 +364,8 @@ build_subplan(PlannerInfo *root, Plan *plan,
PlannerInfo *subroot,
                 * SS_replace_correlation_vars).  Do that now.
                 */
                if (IsA(arg, PlaceHolderVar) ||
-                       IsA(arg, Aggref))
+                       IsA(arg, Aggref) ||
+                       IsA(arg, GroupingFunc))
                        arg = SS_process_sublinks(root, arg, false);

Thanks
Richard

#9Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#8)
Re: BUG #17088: FailedAssertion in prepagg.c

On Thu, Jul 8, 2021 at 1:44 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Jul 8, 2021 at 5:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Hmm. Maybe it'd be better if the default behavior in
Tom> expression_tree_walker/mutator did not include recursing into the
Tom> args, then?

You'd think, but as I recall (I will re-check this to confirm) there
were more places where we _did_ need to recurse (especially during parse
analysis before we've matched up the sortgrouprefs), while most of the
places where recursion needed to be explicitly avoided already needed
special-case handling, so having the default the other way would likely
have required a special-case almost everywhere.

Fair enough. This is the kind of design choice that can be worth
revisiting later; but if the conclusion is still the same, fine with me.

I think the culprit is that when replacing correlation uplevel vars with
Params, we do not handle the SubLinks in the arguments of uplevel
GroupingFunc. We expect build_subplan should take care of it. But in
build_subplan, we ignore GroupingFunc incorrectly.

diff --git a/src/backend/optimizer/plan/subselect.c
b/src/backend/optimizer/plan/subselect.c
index 0881a208ac..e4918f275e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -364,7 +364,8 @@ build_subplan(PlannerInfo *root, Plan *plan,
PlannerInfo *subroot,
* SS_replace_correlation_vars).  Do that now.
*/
if (IsA(arg, PlaceHolderVar) ||
-                       IsA(arg, Aggref))
+                       IsA(arg, Aggref) ||
+                       IsA(arg, GroupingFunc))
arg = SS_process_sublinks(root, arg, false);

I think we also need to change SS_process_sublinks to avoid recursing
into the arguments of an outer GroupingFunc. And that leads to a fix as
attached.

Thanks
Richard

Attachments:

0001-Fix-GroupingFunc-assertion-failure.patchapplication/octet-stream; name=0001-Fix-GroupingFunc-assertion-failure.patchDownload+7-2
#10Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#9)
Re: BUG #17088: FailedAssertion in prepagg.c

On Thu, Jul 8, 2021 at 2:14 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Jul 8, 2021 at 1:44 PM Richard Guo <guofenglinux@gmail.com> wrote:

I think the culprit is that when replacing correlation uplevel vars with
Params, we do not handle the SubLinks in the arguments of uplevel
GroupingFunc. We expect build_subplan should take care of it. But in
build_subplan, we ignore GroupingFunc incorrectly.

diff --git a/src/backend/optimizer/plan/subselect.c
b/src/backend/optimizer/plan/subselect.c
index 0881a208ac..e4918f275e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -364,7 +364,8 @@ build_subplan(PlannerInfo *root, Plan *plan,
PlannerInfo *subroot,
* SS_replace_correlation_vars).  Do that now.
*/
if (IsA(arg, PlaceHolderVar) ||
-                       IsA(arg, Aggref))
+                       IsA(arg, Aggref) ||
+                       IsA(arg, GroupingFunc))
arg = SS_process_sublinks(root, arg, false);

I think we also need to change SS_process_sublinks to avoid recursing
into the arguments of an outer GroupingFunc. And that leads to a fix as
attached.

Update the patch with comments and test cases.

Thanks
Richard

Attachments:

v1-0001-Fix-handling-of-outer-GroupingFunc-within-subqueries.patchapplication/octet-stream; name=v1-0001-Fix-handling-of-outer-GroupingFunc-within-subqueries.patchDownload+72-11
#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Richard Guo (#10)
Re: BUG #17088: FailedAssertion in prepagg.c

At Fri, 9 Jul 2021 14:54:02 +0800, Richard Guo <guofenglinux@gmail.com> wrote in

Update the patch with comments and test cases.

AFAICS the patch looks correct. It works for the first example and
the two from Tom. I don't find other place that has the similar
issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#11)
Re: BUG #17088: FailedAssertion in prepagg.c

Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:

At Fri, 9 Jul 2021 14:54:02 +0800, Richard Guo <guofenglinux@gmail.com> wrote in

Update the patch with comments and test cases.

AFAICS the patch looks correct. It works for the first example and
the two from Tom. I don't find other place that has the similar
issue.

I'd been expecting Andrew to pick this up, but since he hasn't,
I took a look.

I concur that the core problem is that GroupingFunc has to be treated
almost exactly like Aggref, and here we have a couple of places that
didn't get that memo. So it occurred to me to look for other places
that special-case Aggref and don't have parallel code for GroupingFunc,
and I found several:

expression_returns_set_walker

This isn't particularly hazardous, since the argument (probably?) can't
contain a SRF, but it still seems like it ought to treat the two node
types the same.

cost_qual_eval_walker

It's defaulting to charging the eval costs of the arguments, which is
flat wrong. I made it charge one cpu_operator_cost instead.

ruleutils.c

Various places concerned with whether or not we need parens were
making the wrong choice, resulting in excess parens in pretty-printing
mode. This is also just cosmetic, but still.

This looks good to me now, and I'll set about back-patching.

regards, tom lane

Attachments:

v2-0001-fix-oversights-in-GroupingFunc-handling.patchtext/x-diff; charset=us-ascii; name=v2-0001-fix-oversights-in-GroupingFunc-handling.patchDownload+88-14