Improper const-evaluation of HAVING with grouping sets and subquery pullup

Started by Heikki Linnakangasover 8 years ago21 messagesbugs
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

This query produces an incorrect result:

regression=# select four, x
from (select four, ten, 'foo'::text as x from tenk1 ) as t
group by grouping sets(four, x) having x = 'foo' order by four;
four | x
------+-----
0 |
1 |
2 |
3 |
| foo
(5 rows)

The "having x = 'foo'" clause should've filtered out the rows where x is
NULL, leaving only the last row as the result. Even though x is a
constant 'foo' in the subquery, HAVING clause is supposed to be
evaluated after grouping. What happens is that subquery pullup replaces
x with the constant, and the "'foo' = 'foo'" qual is later
const-evaluated to true.

I propose the attached patch to fix that. It forces the use of
PlaceHolderVars in subquery pullup, if the parent query has grouping
sets and HAVING. I'm not 100% sure that's the right approach or a misuse
of the placeholder system, so comments welcome. At first, I tried to set
wrap_non_vars=true only when processing the havingQual, so that
placeholders would only be there. But that didn't work out, I think
because grouping sets planning would then put both the Const, and the
PlaceHolderVar for the Const, in the Agg's targetlist, but only one of
them would be set to NULL when doing the grouping.

Another thing is that the check could be made much tighter, so that
PlaceHolderVars were only used for expressions actually used in the
HAVING. But it didn't seem worth the trouble to me.

- Heikki

Attachments:

0001-Fix-subquery-pullup-into-a-query-containing-GROUPING.patchtext/x-patch; name=0001-Fix-subquery-pullup-into-a-query-containing-GROUPING.patchDownload+60-9
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Heikki Linnakangas <hlinnaka@iki.fi> writes:

This query produces an incorrect result:
regression=# select four, x
from (select four, ten, 'foo'::text as x from tenk1 ) as t
group by grouping sets(four, x) having x = 'foo' order by four;

The "having x = 'foo'" clause should've filtered out the rows where x is
NULL, leaving only the last row as the result. Even though x is a
constant 'foo' in the subquery, HAVING clause is supposed to be
evaluated after grouping. What happens is that subquery pullup replaces
x with the constant, and the "'foo' = 'foo'" qual is later
const-evaluated to true.

Ouch.

I propose the attached patch to fix that. It forces the use of
PlaceHolderVars in subquery pullup, if the parent query has grouping
sets and HAVING. I'm not 100% sure that's the right approach or a misuse
of the placeholder system, so comments welcome.

Seems like the point is that grouping sets can inject null values of
columns, in more or less the same way that outer joins can. So it
seems like using PlaceHolderVars, which were invented to account
for that property of outer joins, is a reasonable approach to a fix.
I don't have time to review the patch in detail right now though;
do you want to put it in the CF queue?

One thing I'm wondering is why only the HAVING clause would be subject
to the problem. I'm a bit surprised that the "x" in the targetlist
didn't become a constant as well. This may be pointing to some
klugery in the GROUPING SETS patch that we could clean up if we
use placeholders for this.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> One thing I'm wondering is why only the HAVING clause would be
Tom> subject to the problem. I'm a bit surprised that the "x" in the
Tom> targetlist didn't become a constant as well. This may be pointing
Tom> to some klugery in the GROUPING SETS patch that we could clean up
Tom> if we use placeholders for this.

This shows that the problem can extend to the targetlist too:

select four, x || 'x'
from (select four, ten, 'foo'::text as x from tenk1 ) as t
group by grouping sets(four, x);

four | ?column?
------+----------
3 | foox
0 | foox
1 | foox
2 | foox
| foox
(5 rows)

What seems to happen in the original case is that the 'foo' constant
ends up in the projection of the input to the aggregate node, with a Var
in the output.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

I propose the attached patch to fix that. It forces the use of
PlaceHolderVars in subquery pullup, if the parent query has grouping
sets and HAVING. I'm not 100% sure that's the right approach or a
misuse of the placeholder system, so comments welcome.

I've been testing Heikki's patch with the havingQual condition removed,
and I haven't found any issues yet.

Tom> One thing I'm wondering is why only the HAVING clause would be
Tom> subject to the problem. I'm a bit surprised that the "x" in the
Tom> targetlist didn't become a constant as well. This may be pointing
Tom> to some klugery in the GROUPING SETS patch that we could clean up
Tom> if we use placeholders for this.

As far as I can tell, the "x" _does_ become a constant, but then in
setrefs, because it's still labelled with a sortgroupref, it gets
replaced by a Var again. I don't recall touching any of that in the GS
work, because it was already like that for plain GROUP BY.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Andrew Gierth (#4)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Andrew> I've been testing Heikki's patch with the havingQual condition
Andrew> removed, and I haven't found any issues yet.

Further experimentation reveals that this change has the effect of
blocking constant-folding in aggregate input expressions that refer to
constant outputs of the subquery.

Is it worth doing anything about that?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#5)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Further experimentation reveals that this change has the effect of
blocking constant-folding in aggregate input expressions that refer to
constant outputs of the subquery.
Is it worth doing anything about that?

Uh ... but I thought the point here is that the outputs aren't really
constant in the presence of grouping sets.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#6)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Further experimentation reveals that this change has the effect of
blocking constant-folding in aggregate input expressions that refer
to constant outputs of the subquery. Is it worth doing anything
about that?

Tom> Uh ... but I thought the point here is that the outputs aren't
Tom> really constant in the presence of grouping sets.

select x, y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
group by grouping sets (x,y);

x and y aren't constants, but z is.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#7)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Uh ... but I thought the point here is that the outputs aren't
Tom> really constant in the presence of grouping sets.

select x, y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
group by grouping sets (x,y);

x and y aren't constants, but z is.

OK, but that just means we should put PHV wrapping around only the
grouping-set columns.

BTW, also need to think about GS expressions, eg

select x+y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
group by grouping sets (x+y);

Not real sure what needs to happen here.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#8)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

x and y aren't constants, but z is.

Tom> OK, but that just means we should put PHV wrapping around only the
Tom> grouping-set columns.

Well, can we also take advantage of the fact that we know that anything
that's not in the grouping-set columns must be in an aggregate argument,
and just omit the PHV inside aggregate args? (And even if grouping
columns appear inside aggregate args, they are _not_ nulled out there.)

Tom> BTW, also need to think about GS expressions, eg

Tom> select x+y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
Tom> group by grouping sets (x+y);

Tom> Not real sure what needs to happen here.

That one currently works (note you have to add another grouping set to
test it, since the case of exactly one grouping set is reduced to plain
GROUP BY) because setrefs fixes up the reference after-the-fact,
replacing the outer x+y (or whatever it got munged to) with a Var based
on matching the sortgroupref. This currently fails:

select (x+y)*1, sum(z) from (select 1 as x, 2 as y, 3 as z) s
group by grouping sets (x+y, x);

because the logic in setrefs that would normally detect that (x+y)
exists in the child tlist doesn't fire because the whole expression was
replaced by a constant.

With the patch to use PHVs it works, but I admit to some confusion over
exactly why.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrew Gierth (#9)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Here's another interesting case, without any subqueries:

postgres=# SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
newalias1 | newalias3
-----------+-----------
1 | 1
3 | 3
2 | 2
2 | 2
3 | 3
1 | 1
(6 rows)

Why are there no "summary" rows with NULLs, despite the ROLLUP? If you
replace one of the g's with (g+0), you get the expected result:

postgres=# SELECT g as newalias1, (g+0) as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
newalias1 | newalias3
-----------+-----------
1 | 1
3 | 3
2 | 2
2 |
3 |
1 |
(6 rows)

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Heikki Linnakangas (#10)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Heikki" == Heikki Linnakangas <hlinnaka@iki.fi> writes:

Heikki> Here's another interesting case, without any subqueries:
Heikki> postgres=# SELECT g as newalias1, g as newalias3
Heikki> FROM generate_series(1,3) g
Heikki> GROUP BY newalias1, ROLLUP(newalias3);
Heikki> newalias1 | newalias3
Heikki> -----------+-----------
Heikki> 1 | 1
Heikki> 3 | 3
Heikki> 2 | 2
Heikki> 2 | 2
Heikki> 3 | 3
Heikki> 1 | 1
Heikki> (6 rows)

Heikki> Why are there no "summary" rows with NULLs, despite the ROLLUP?

To my knowledge this is the correct result. (Though neither version of
the query is legal per the SQL spec; allowing expressions and aliases in
GROUP BY are nonstandard extensions.)

Here's why it happens: after substituting for the aliases, you have

GROUP BY g, rollup(g)

which is equivalent to

GROUP BY GROUPING SETS ((g,g), (g))

which is equivalent to

GROUP BY GROUPING SETS ((g), (g))

because duplicate terms within a single grouping set are redundant just
as they are in GROUP BY.

Heikki> If you replace one of the g's with (g+0), you get the expected
Heikki> result:

Well, in this case the terms in the grouping set are no longer
duplicate; the expansion becomes

GROUP BY GROUPING SETS ((g,(g+0)), (g))

and therefore the (g+0) expression becomes null for one of the resulting
sets.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Andrew Gierth (#11)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Andrew> To my knowledge this is the correct result. (Though neither
Andrew> version of the query is legal per the SQL spec; allowing
Andrew> expressions and aliases in GROUP BY are nonstandard
Andrew> extensions.)

And neither Oracle nor MSSQL (at least the versions available on
sqlfiddle) allow either query, so there's nothing I can compare against.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#11)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Heikki" == Heikki Linnakangas <hlinnaka@iki.fi> writes:
Heikki> Why are there no "summary" rows with NULLs, despite the ROLLUP?

Here's why it happens: after substituting for the aliases, you have
GROUP BY g, rollup(g)
which is equivalent to
GROUP BY GROUPING SETS ((g,g), (g))
which is equivalent to
GROUP BY GROUPING SETS ((g), (g))

I don't think I buy this explanation, because the plan tree doesn't show
any indication that we're actually folding (g,g) to (g):

regression=# EXPLAIN SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
QUERY PLAN
--------------------------------------------------------------------------------
HashAggregate (cost=15.00..21.50 rows=400 width=8)
Hash Key: g, g
Hash Key: g
-> Function Scan on generate_series g (cost=0.00..10.00 rows=1000 width=8)
(4 rows)

regression=# EXPLAIN SELECT g as newalias1, g+0 as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
QUERY PLAN
--------------------------------------------------------------------------------
HashAggregate (cost=17.50..25.00 rows=400 width=8)
Hash Key: g, (g + 0)
Hash Key: g
-> Function Scan on generate_series g (cost=0.00..12.50 rows=1000 width=8)
(4 rows)

If these behave differently, why does the plan structure look the same?

I think that Heikki's expectation is the correct one, and the reason the
output looks the way it does is that setrefs.c is dropping the ball
somehow and confusing the two "g" references. The finished plan has two
identical Var references in the Agg node's output tlist:

:targetlist (
{TARGETENTRY
:expr
{VAR
:varno 65001
:varattno 1
:vartype 23
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnoold 1
:varoattno 1
:location 7
}
:resno 1
:resname newalias1
:ressortgroupref 1
:resorigtbl 0
:resorigcol 0
:resjunk false
}
{TARGETENTRY
:expr
{VAR
:varno 65001
:varattno 1
:vartype 23
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnoold 1
:varoattno 1
:location 23
}
:resno 2
:resname newalias3
:ressortgroupref 2
:resorigtbl 0
:resorigcol 0
:resjunk false
}
)

The two "g" values are correctly distinguished in the FunctionScan's
tlist, however:

:targetlist (
{TARGETENTRY
:expr
{VAR
:varno 1
:varattno 1
:vartype 23
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnoold 1
:varoattno 1
:location 7
}
:resno 1
:resname <>
:ressortgroupref 1
:resorigtbl 0
:resorigcol 0
:resjunk false
}
{TARGETENTRY
:expr
{VAR
:varno 1
:varattno 1
:vartype 23
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnoold 1
:varoattno 1
:location 23
}
:resno 2
:resname <>
:ressortgroupref 2
:resorigtbl 0
:resorigcol 0
:resjunk false
}
)

We should have used ressortgroupref matching to prevent this, but without
having checked the code right now, I think that we might only apply that
logic to non-Var tlist entries. If the Agg output tlist had mentioned
column 2 not column 1 of the child node, I bet we'd get the right answer.

Digression: this seems like another example in which the "same" Var can
represent two different values. I've had a bee in my bonnet for awhile
that we need to stop doing that, but I'm not entirely sure what to do
instead. In the case of Vars that might go to null because of an outer
join, we could perhaps fix things by not smashing join alias Vars to their
referents (at least, not till much much later in the planner than we do
now). That doesn't seem to apply here though. Perhaps the GROUP BY
operation ought to be understood as providing its own set of output Vars?
That would mean creating an RTE to represent GROUP BY, but maybe that's
not awful.

In the nearer term, maybe we can get a back-patchable fix by being
more careful about ressortgroupref matching.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

I wrote:

I think that Heikki's expectation is the correct one, and the reason the
output looks the way it does is that setrefs.c is dropping the ball
somehow and confusing the two "g" references. ...
We should have used ressortgroupref matching to prevent this, but without
having checked the code right now, I think that we might only apply that
logic to non-Var tlist entries. If the Agg output tlist had mentioned
column 2 not column 1 of the child node, I bet we'd get the right answer.

Indeed, the attached patch passes all regression tests and produces the
same answers for both of Heikki's examples:

regression=# SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
newalias1 | newalias3
-----------+-----------
1 | 1
3 | 3
2 | 2
2 |
3 |
1 |
(6 rows)

regards, tom lane

Attachments:

check-sortgroupref-for-Vars-too.patchtext/x-diff; charset=us-ascii; name=check-sortgroupref-for-Vars-too.patchDownload+4-5
#15Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#13)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> I don't think I buy this explanation, because the plan tree
Tom> doesn't show any indication that we're actually folding (g,g) to
Tom> (g):

Huh. Yes, you're right.

Tom> Digression: this seems like another example in which the "same"
Tom> Var can represent two different values.

At one point in the evolution of the GS patch it had a new node type,
GroupedVar or some such name, to represent the possibly-nulled-out
values. I'm not sure that it was being introduced early enough in
planning to have prevented this problem (I suspect not, I'd have to dig
up the old code). That stuff was all ripped out very late in the
development process because as I recall, both you and Andres disliked
it.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

I wrote:

We should have used ressortgroupref matching to prevent this, but without
having checked the code right now, I think that we might only apply that
logic to non-Var tlist entries. If the Agg output tlist had mentioned
column 2 not column 1 of the child node, I bet we'd get the right answer.

Indeed, the attached patch passes all regression tests and produces the
same answers for both of Heikki's examples:

I did some back-porting work on this. The patch applies back to 9.5
where grouping sets were introduced, but it only fixes the problem
in 9.6 and later --- in 9.5, you still get the wrong output :-(.

Bisecting says the behavior changed at commit 3fc6e2d7f "Make the upper
part of the planner work by generating and comparing Paths". Ugh.
We are certainly not back-patching that into 9.5, and I'm disinclined
even to try to identify exactly why that commit changed the behavior.

Given that this is such a weird corner case, and we've not heard
complaints from actual users about it, I'm inclined just to apply
the patch in 9.6 and up and call it good.

regards, tom lane

Attachments:

check-sortgroupref-for-Vars-too-2.patchtext/x-diff; charset=us-ascii; name=check-sortgroupref-for-Vars-too-2.patchDownload+44-5
#17Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#16)
Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

On Thu, Oct 26, 2017 at 4:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given that this is such a weird corner case, and we've not heard
complaints from actual users about it, I'm inclined just to apply
the patch in 9.6 and up and call it good.

Tom, are you planning to finish wrapping up this one? For now I am
moving it to next CF.
--
Michael

#18Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Michael Paquier (#17)
Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

Michael> Tom, are you planning to finish wrapping up this one? For now
Michael> I am moving it to next CF.

I can take it if Tom and Heikki are busy.

--
Andrew.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#18)
Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Michael" == Michael Paquier <michael.paquier@gmail.com> writes:
Michael> Tom, are you planning to finish wrapping up this one? For now
Michael> I am moving it to next CF.

I can take it if Tom and Heikki are busy.

FWIW, I'm not really comfortable that the proposed patch is correct
or complete. It may just need more study to get there.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#19)
Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

I wrote:

FWIW, I'm not really comfortable that the proposed patch is correct
or complete. It may just need more study to get there.

I've done another round of study on this patch. The attached updated
version is the same code as Heikki proposed (minus the incorrect
restriction to queries with HAVING quals), but I reworked the comments
and expanded the regression test cases.

One thing that wasn't clear to me before was whether we need wrap_non_vars
for this case or not; we don't for outer joins, so I was unconvinced about
it here. It turns out we do: the point of the wrapper is to prevent
constant folding or other expression preprocessing from merging a
pulled-up expression with the surrounding expression, resulting in
something that won't match the grouping set expression when it comes time
to do that matching. For instance if we have a boolean subquery output
expression, say "x = y as cond", and that gets hoisted into an upper
expression "not cond", then without the PHV wrapper we will happily
simplify that to "x <> y" which will not match the grouping set
expression. There's a regression test below that misbehaves if you take
out the "wrap_non_vars = true" line.

I spent some time thinking about Andrew's observation that we don't really
need the wrappers everyplace. It's true, but pullup_replace_vars is far
from being able to do the right thing there, and I'm not sure that trying
to teach it to do so is reasonable. (I'm inclined to think that the idea
I threw out upthread about converting grouping expressions into Vars
belonging to a new RTE kind might be the way to go.) In any case I don't
think we'd possibly come out with a patch simple enough to back-patch.
So let's leave that optimization for future work.

I think the attached is probably ready to go, though I've not checked yet
whether it will work pre-9.6. Anyone want to do more review?

regards, tom lane

Attachments:

fix-grouping-sets-pullup-v2.patchtext/x-diff; charset=us-ascii; name=fix-grouping-sets-pullup-v2.patchDownload+113-13
#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#20)