Fix HAVING-to-WHERE pushdown with mismatched operator families

Started by Richard Guo24 days ago7 messageshackers
Jump to latest
#1Richard Guo
guofenglinux@gmail.com

Continued with the rabbit-hole around equality-relation issues, here
is another one in the HAVING-to-WHERE pushdown path.

f76686ce7 stopped the planner from pushing a HAVING clause to WHERE
when the clause's collation disagreed with the GROUP BY's
nondeterministic collation. The same shape of bug exists with
operator families: when a HAVING clause uses a comparison operator
from a different opfamily than the GROUP BY's eqop, pushing it to
WHERE can produce wrong results.

create type t_rec as (a numeric, b int);
create table t_having (id int, r t_rec);

insert into t_having values
(1, row(100, 1)::t_rec),
(2, row(100.0, 1)::t_rec),
(3, row(2, 2)::t_rec);

-- wrong result: count should be 2
select r, count(*) from t_having group by r having r *= row(100, 1)::t_rec;
r | count
---------+-------
(100,1) | 1
(1 row)

The fix in the attached patch mirrors f76686ce7's structure. We
detect the conflict before flatten_group_exprs while the HAVING clause
still contains GROUP Vars, and record the indices of unsafe clauses in
a Bitmapset that's consulted by the existing pushdown loop.

This issue can be reproduced as far back as v14. However, the fix
relies on RTE_GROUP to identify grouping expressions via GROUP Vars on
pre-flatten havingQual. As with f76686ce7, I'm inclined to back-patch
to v18 only.

Thoughts?

- Richard

Attachments:

v1-0001-Fix-HAVING-to-WHERE-pushdown-with-mismatched-oper.patchapplication/octet-stream; name=v1-0001-Fix-HAVING-to-WHERE-pushdown-with-mismatched-oper.patchDownload+334-37
#2Thom Brown
thom@linux.com
In reply to: Richard Guo (#1)
Re: Fix HAVING-to-WHERE pushdown with mismatched operator families

On Tue, 26 May 2026 at 09:48, Richard Guo <guofenglinux@gmail.com> wrote:

Continued with the rabbit-hole around equality-relation issues, here
is another one in the HAVING-to-WHERE pushdown path.

f76686ce7 stopped the planner from pushing a HAVING clause to WHERE
when the clause's collation disagreed with the GROUP BY's
nondeterministic collation. The same shape of bug exists with
operator families: when a HAVING clause uses a comparison operator
from a different opfamily than the GROUP BY's eqop, pushing it to
WHERE can produce wrong results.

create type t_rec as (a numeric, b int);
create table t_having (id int, r t_rec);

insert into t_having values
(1, row(100, 1)::t_rec),
(2, row(100.0, 1)::t_rec),
(3, row(2, 2)::t_rec);

-- wrong result: count should be 2
select r, count(*) from t_having group by r having r *= row(100, 1)::t_rec;
r | count
---------+-------
(100,1) | 1
(1 row)

The fix in the attached patch mirrors f76686ce7's structure. We
detect the conflict before flatten_group_exprs while the HAVING clause
still contains GROUP Vars, and record the indices of unsafe clauses in
a Bitmapset that's consulted by the existing pushdown loop.

This issue can be reproduced as far back as v14. However, the fix
relies on RTE_GROUP to identify grouping expressions via GROUP Vars on
pre-flatten havingQual. As with f76686ce7, I'm inclined to back-patch
to v18 only.

Thoughts?

Makes sense to me, but out of curiosity, while digging into these
opfamily mismatches, have you noticed if this same record_ops vs
record_image_ops inequality poses any risks to other optimisation
paths like window function pushdowns or partition pruning? And
apologies if that has already been discussed, but I couldn't find
mention of it.

Thom

#3Richard Guo
guofenglinux@gmail.com
In reply to: Thom Brown (#2)
Re: Fix HAVING-to-WHERE pushdown with mismatched operator families

On Tue, May 26, 2026 at 11:06 PM Thom Brown <thom@linux.com> wrote:

Makes sense to me, but out of curiosity, while digging into these
opfamily mismatches, have you noticed if this same record_ops vs
record_image_ops inequality poses any risks to other optimisation
paths like window function pushdowns or partition pruning? And
apologies if that has already been discussed, but I couldn't find
mention of it.

Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.

create type t_rec as (a numeric);
create table t (a t_rec, b int);
insert into t values (row(1.0), 10), (row(1.00), 20);

-- wrong result: should be 0 rows
select * from
(select distinct on (a) a, b from t order by a, b) s
where a *= row(1.00)::t_rec;
a | b
--------+----
(1.00) | 20
(1 row)

-- wrong result: rk should be 2
select * from
(select a, b, rank() over (partition by a order by b) as rk from t) s
where a *= row(1.00)::t_rec;
a | b | rk
--------+----+----
(1.00) | 20 | 1
(1 row)

In addtition, collation mismatch can also cause wrong results in this
area.

create collation ci (provider = icu, locale = 'und-u-ks-level2',
deterministic = false);
create table t1 (a text collate ci, b int);
insert into t1 values ('abc', 1), ('ABC', 2);

-- wrong result: should be 0 rows
select * from
(select distinct on (a) a, b from t1 order by a, b) s
where a = 'ABC' collate "C";
a | b
-----+---
ABC | 2
(1 row)

-- wrong result: rk should be 2
select * from
(select a, b, rank() over (partition by a order by b) as rk from t1) s
where a = 'ABC' collate "C";
a | b | rk
-----+---+----
ABC | 2 | 1
(1 row)

- Richard

#4Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#3)
Re: Fix HAVING-to-WHERE pushdown with mismatched operator families

On Wed, May 27, 2026 at 8:04 AM Richard Guo <guofenglinux@gmail.com> wrote:

Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.

I think all these issues belong to the same class of bug: the planner
moves a qual clause across a grouping layer, and the result is wrong
when the qual's equivalence relation disagrees with the grouping's,
either an opfamily mismatch or a nondeterministic-collation mismatch.
This includes HAVING-to-WHERE pushdown, as well as qual pushdown into
a subquery past its DISTINCT, DISTINCT ON, window PARTITION BY, or
set-operation grouping layer.

v2 attached tries to fix the full bug class through a shared walker
expression_has_grouping_conflict that detects either kind of conflict
in an expression tree. The walker takes a callback that maps each
Var to the grouping equality operator for its column (or InvalidOid
for non-grouping Vars). See the commit message for details.

- Richard

Attachments:

v2-0001-Fix-qual-pushdown-past-grouping-with-mismatched-e.patchapplication/octet-stream; name=v2-0001-Fix-qual-pushdown-past-grouping-with-mismatched-e.patchDownload+1224-208
#5Thom Brown
thom@linux.com
In reply to: Richard Guo (#3)
Re: Fix HAVING-to-WHERE pushdown with mismatched operator families

On Wed, 27 May 2026, 00:04 Richard Guo, <guofenglinux@gmail.com> wrote:

On Tue, May 26, 2026 at 11:06 PM Thom Brown <thom@linux.com> wrote:

Makes sense to me, but out of curiosity, while digging into these
opfamily mismatches, have you noticed if this same record_ops vs
record_image_ops inequality poses any risks to other optimisation
paths like window function pushdowns or partition pruning? And
apologies if that has already been discussed, but I couldn't find
mention of it.

Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.

Yeah, sorry, that wording wasn't clear. I just meant pushing a qual down
past a window function's PARTITION BY, which is the one case the planner
allows. Looks like that's exactly the rank() case you turned up, so you've
answered it better than I could have asked it. The v2 approach of treating
the whole thing as one bug class looks like the right call to me.

Regards

Thom

Show quoted text
#6Florin Irion
irionr@gmail.com
In reply to: Richard Guo (#4)
Re: Fix HAVING-to-WHERE pushdown with mismatched operator families

Il giorno gio 28 mag 2026 alle ore 11:11 Richard Guo <guofenglinux@gmail.com>
ha scritto:

On Wed, May 27, 2026 at 8:04 AM Richard Guo <guofenglinux@gmail.com>
wrote:

Thanks for raising these points. For partition pruning,
match_clause_to_partition_key() already checks both collation and
opfamily compatibility, so I don't think it has similar issues. I'm
not sure what is meant by "window function pushdowns", but your
question prompted me to look around, and I did notice that pushing
restriction clauses down into a subquery suffers from a similar
problem, specifically, when the subquery has DISTINCT, DISTINCT ON, or
a window PARTITION BY clause.

I think all these issues belong to the same class of bug: the planner
moves a qual clause across a grouping layer, and the result is wrong
when the qual's equivalence relation disagrees with the grouping's,
either an opfamily mismatch or a nondeterministic-collation mismatch.
This includes HAVING-to-WHERE pushdown, as well as qual pushdown into
a subquery past its DISTINCT, DISTINCT ON, window PARTITION BY, or
set-operation grouping layer.

v2 attached tries to fix the full bug class through a shared walker
expression_has_grouping_conflict that detects either kind of conflict
in an expression tree. The walker takes a callback that maps each
Var to the grouping equality operator for its column (or InvalidOid
for non-grouping Vars). See the commit message for details.

- Richard

Hi,

The patch fixes DISTINCT/window/set-op subqueries and HAVING, but does it
miss the analogous case for GROUP BY subqueries as the pushdown target?

When an outer qual is pushed into a GROUP BY subquery it lands in
havingQual (correct), but find_having_conflicts then misses the conflict
because the pushed qual carries base-table Vars, not GROUP Vars — so the
clause gets silently moved to WHERE, filtering before aggregation.

Reproducer:
```
CREATE TYPE t_rec AS (x numeric);
CREATE TABLE t_grp (a t_rec);
INSERT INTO t_grp VALUES (ROW(1.0)), (ROW(1.00)), (ROW(2));

-- record_ops (default) considers 1.0 and 1.00 equal; record_image_ops
does not.
-- Expected: one row (1.0), count = 2
-- Got: one row (1.0), count = 1 (wrong)
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
```

EXPLAIN shows the *= filter pushed inside the aggregate scan rather than
sitting above it as a Subquery Scan filter.

Cheers,
Florin
--
* Florin Irion *
* https://www.enterprisedb.com <https://www.enterprisedb.com/&gt;*

#7Richard Guo
guofenglinux@gmail.com
In reply to: Florin Irion (#6)
Re: Fix HAVING-to-WHERE pushdown with mismatched operator families

On Fri, Jun 19, 2026 at 12:34 AM Florin Irion <irionr@gmail.com> wrote:

When an outer qual is pushed into a GROUP BY subquery it lands in havingQual (correct), but find_having_conflicts then misses the conflict because the pushed qual carries base-table Vars, not GROUP Vars — so the clause gets silently moved to WHERE, filtering before aggregation.

I don't think so. The pushed qual carries the GROUP Vars, not the
base-table Vars.

Reproducer:
```
CREATE TYPE t_rec AS (x numeric);
CREATE TABLE t_grp (a t_rec);
INSERT INTO t_grp VALUES (ROW(1.0)), (ROW(1.00)), (ROW(2));

-- record_ops (default) considers 1.0 and 1.00 equal; record_image_ops does not.
-- Expected: one row (1.0), count = 2
-- Got: one row (1.0), count = 1 (wrong)
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
```

I ran your reproducer, and I got the Expected result:

SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
a | count
-------+-------
(1.0) | 2
(1 row)

Curious how you got the wrong result with this patch.

EXPLAIN shows the *= filter pushed inside the aggregate scan rather than sitting above it as a Subquery Scan filter.

Here is the EXPLAIN I got:

EXPLAIN (COSTS OFF)
SELECT * FROM (SELECT a, count(*) FROM t_grp GROUP BY a) s
WHERE a *= ROW(1.0)::t_rec;
QUERY PLAN
---------------------------------------
HashAggregate
Group Key: t_grp.a
Filter: (t_grp.a *= '(1.0)'::t_rec)
-> Seq Scan on t_grp
(4 rows)

So the filter stays in HAVING instead of being pushed to Scan, which
is expected. I wonder how you get a plan with the filter being pushed
to scan. Can you show your output of EXPLAIN?

- Richard