Fix HAVING-to-WHERE pushdown with mismatched operator families
Continued with the rabbit-hole around equality-relation issues, here
is another one in the HAVING-to-WHERE pushdown path.
f76686ce7 stopped the planner from pushing a HAVING clause to WHERE
when the clause's collation disagreed with the GROUP BY's
nondeterministic collation. The same shape of bug exists with
operator families: when a HAVING clause uses a comparison operator
from a different opfamily than the GROUP BY's eqop, pushing it to
WHERE can produce wrong results.
create type t_rec as (a numeric, b int);
create table t_having (id int, r t_rec);
insert into t_having values
(1, row(100, 1)::t_rec),
(2, row(100.0, 1)::t_rec),
(3, row(2, 2)::t_rec);
-- wrong result: count should be 2
select r, count(*) from t_having group by r having r *= row(100, 1)::t_rec;
r | count
---------+-------
(100,1) | 1
(1 row)
The fix in the attached patch mirrors f76686ce7's structure. We
detect the conflict before flatten_group_exprs while the HAVING clause
still contains GROUP Vars, and record the indices of unsafe clauses in
a Bitmapset that's consulted by the existing pushdown loop.
This issue can be reproduced as far back as v14. However, the fix
relies on RTE_GROUP to identify grouping expressions via GROUP Vars on
pre-flatten havingQual. As with f76686ce7, I'm inclined to back-patch
to v18 only.
Thoughts?
- Richard
Attachments:
v1-0001-Fix-HAVING-to-WHERE-pushdown-with-mismatched-oper.patchapplication/octet-stream; name=v1-0001-Fix-HAVING-to-WHERE-pushdown-with-mismatched-oper.patchDownload+334-37
On Tue, 26 May 2026 at 09:48, Richard Guo <guofenglinux@gmail.com> wrote:
Continued with the rabbit-hole around equality-relation issues, here
is another one in the HAVING-to-WHERE pushdown path.f76686ce7 stopped the planner from pushing a HAVING clause to WHERE
when the clause's collation disagreed with the GROUP BY's
nondeterministic collation. The same shape of bug exists with
operator families: when a HAVING clause uses a comparison operator
from a different opfamily than the GROUP BY's eqop, pushing it to
WHERE can produce wrong results.create type t_rec as (a numeric, b int);
create table t_having (id int, r t_rec);insert into t_having values
(1, row(100, 1)::t_rec),
(2, row(100.0, 1)::t_rec),
(3, row(2, 2)::t_rec);-- wrong result: count should be 2
select r, count(*) from t_having group by r having r *= row(100, 1)::t_rec;
r | count
---------+-------
(100,1) | 1
(1 row)The fix in the attached patch mirrors f76686ce7's structure. We
detect the conflict before flatten_group_exprs while the HAVING clause
still contains GROUP Vars, and record the indices of unsafe clauses in
a Bitmapset that's consulted by the existing pushdown loop.This issue can be reproduced as far back as v14. However, the fix
relies on RTE_GROUP to identify grouping expressions via GROUP Vars on
pre-flatten havingQual. As with f76686ce7, I'm inclined to back-patch
to v18 only.Thoughts?
Makes sense to me, but out of curiosity, while digging into these
opfamily mismatches, have you noticed if this same record_ops vs
record_image_ops inequality poses any risks to other optimisation
paths like window function pushdowns or partition pruning? And
apologies if that has already been discussed, but I couldn't find
mention of it.
Thom
On Tue, May 26, 2026 at 11:06 PM Thom Brown <thom@linux.com> wrote:
Makes sense to me, but out of curiosity, while digging into these
opfamily mismatches, have you noticed if this same record_ops vs
record_image_ops inequality poses any risks to other optimisation
paths like window function pushdowns or partition pruning? And
apologies if that has already been discussed, but I couldn't find
mention of it.
Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.
create type t_rec as (a numeric);
create table t (a t_rec, b int);
insert into t values (row(1.0), 10), (row(1.00), 20);
-- wrong result: should be 0 rows
select * from
(select distinct on (a) a, b from t order by a, b) s
where a *= row(1.00)::t_rec;
a | b
--------+----
(1.00) | 20
(1 row)
-- wrong result: rk should be 2
select * from
(select a, b, rank() over (partition by a order by b) as rk from t) s
where a *= row(1.00)::t_rec;
a | b | rk
--------+----+----
(1.00) | 20 | 1
(1 row)
In addtition, collation mismatch can also cause wrong results in this
area.
create collation ci (provider = icu, locale = 'und-u-ks-level2',
deterministic = false);
create table t1 (a text collate ci, b int);
insert into t1 values ('abc', 1), ('ABC', 2);
-- wrong result: should be 0 rows
select * from
(select distinct on (a) a, b from t1 order by a, b) s
where a = 'ABC' collate "C";
a | b
-----+---
ABC | 2
(1 row)
-- wrong result: rk should be 2
select * from
(select a, b, rank() over (partition by a order by b) as rk from t1) s
where a = 'ABC' collate "C";
a | b | rk
-----+---+----
ABC | 2 | 1
(1 row)
- Richard
On Wed, May 27, 2026 at 8:04 AM Richard Guo <guofenglinux@gmail.com> wrote:
Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.
I think all these issues belong to the same class of bug: the planner
moves a qual clause across a grouping layer, and the result is wrong
when the qual's equivalence relation disagrees with the grouping's,
either an opfamily mismatch or a nondeterministic-collation mismatch.
This includes HAVING-to-WHERE pushdown, as well as qual pushdown into
a subquery past its DISTINCT, DISTINCT ON, window PARTITION BY, or
set-operation grouping layer.
v2 attached tries to fix the full bug class through a shared walker
expression_has_grouping_conflict that detects either kind of conflict
in an expression tree. The walker takes a callback that maps each
Var to the grouping equality operator for its column (or InvalidOid
for non-grouping Vars). See the commit message for details.
- Richard
Attachments:
v2-0001-Fix-qual-pushdown-past-grouping-with-mismatched-e.patchapplication/octet-stream; name=v2-0001-Fix-qual-pushdown-past-grouping-with-mismatched-e.patchDownload+1224-208
On Wed, 27 May 2026, 00:04 Richard Guo, <guofenglinux@gmail.com> wrote:
On Tue, May 26, 2026 at 11:06 PM Thom Brown <thom@linux.com> wrote:
Makes sense to me, but out of curiosity, while digging into these
opfamily mismatches, have you noticed if this same record_ops vs
record_image_ops inequality poses any risks to other optimisation
paths like window function pushdowns or partition pruning? And
apologies if that has already been discussed, but I couldn't find
mention of it.Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.
Yeah, sorry, that wording wasn't clear. I just meant pushing a qual down
past a window function's PARTITION BY, which is the one case the planner
allows. Looks like that's exactly the rank() case you turned up, so you've
answered it better than I could have asked it. The v2 approach of treating
the whole thing as one bug class looks like the right call to me.
Regards
Thom
Show quoted text
Il giorno gio 28 mag 2026 alle ore 11:11 Richard Guo <guofenglinux@gmail.com>
ha scritto:
On Wed, May 27, 2026 at 8:04 AM Richard Guo <guofenglinux@gmail.com>
wrote:Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.I think all these issues belong to the same class of bug: the planner
moves a qual clause across a grouping layer, and the result is wrong
when the qual's equivalence relation disagrees with the grouping's,
either an opfamily mismatch or a nondeterministic-collation mismatch.
This includes HAVING-to-WHERE pushdown, as well as qual pushdown into
a subquery past its DISTINCT, DISTINCT ON, window PARTITION BY, or
set-operation grouping layer.v2 attached tries to fix the full bug class through a shared walker
expression_has_grouping_conflict that detects either kind of conflict
in an expression tree. The walker takes a callback that maps each
Var to the grouping equality operator for its column (or InvalidOid
for non-grouping Vars). See the commit message for details.- Richard
Hi,
The patch fixes DISTINCT/window/set-op subqueries and HAVING, but does it
miss the analogous case for GROUP BY subqueries as the pushdown target?
When an outer qual is pushed into a GROUP BY subquery it lands in
havingQual (correct), but find_having_conflicts then misses the conflict
because the pushed qual carries base-table Vars, not GROUP Vars — so the
clause gets silently moved to WHERE, filtering before aggregation.
Reproducer:
```
CREATE TYPE t_rec AS (x numeric);
CREATE TABLE t_grp (a t_rec);
INSERT INTO t_grp VALUES (ROW(1.0)), (ROW(1.00)), (ROW(2));
-- record_ops (default) considers 1.0 and 1.00 equal; record_image_ops
does not.
-- Expected: one row (1.0), count = 2
-- Got: one row (1.0), count = 1 (wrong)
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
```
EXPLAIN shows the *= filter pushed inside the aggregate scan rather than
sitting above it as a Subquery Scan filter.
Cheers,
Florin
--
* Florin Irion *
* https://www.enterprisedb.com <https://www.enterprisedb.com/>*
On Fri, Jun 19, 2026 at 12:34 AM Florin Irion <irionr@gmail.com> wrote:
When an outer qual is pushed into a GROUP BY subquery it lands in havingQual (correct), but find_having_conflicts then misses the conflict because the pushed qual carries base-table Vars, not GROUP Vars — so the clause gets silently moved to WHERE, filtering before aggregation.
I don't think so. The pushed qual carries the GROUP Vars, not the
base-table Vars.
Reproducer:
```
CREATE TYPE t_rec AS (x numeric);
CREATE TABLE t_grp (a t_rec);
INSERT INTO t_grp VALUES (ROW(1.0)), (ROW(1.00)), (ROW(2));-- record_ops (default) considers 1.0 and 1.00 equal; record_image_ops does not.
-- Expected: one row (1.0), count = 2
-- Got: one row (1.0), count = 1 (wrong)
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
```
I ran your reproducer, and I got the Expected result:
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
a | count
-------+-------
(1.0) | 2
(1 row)
Curious how you got the wrong result with this patch.
EXPLAIN shows the *= filter pushed inside the aggregate scan rather than sitting above it as a Subquery Scan filter.
Here is the EXPLAIN I got:
EXPLAIN (COSTS OFF)
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
QUERY PLAN
---------------------------------------
HashAggregate
Group Key: t_grp.a
Filter: (t_grp.a *= '(1.0)'::t_rec)
-> Seq Scan on t_grp
(4 rows)
So the filter stays in HAVING instead of being pushed to Scan, which
is expected. I wonder how you get a plan with the filter being pushed
to scan. Can you show your output of EXPLAIN?
- Richard