Fix HAVING-to-WHERE pushdown with nondeterministic collations

Started by Richard Guoabout 1 month ago6 messageshackers
Jump to latest
#1Richard Guo
guofenglinux@gmail.com

As briefly discussed on Discord, when a GROUP BY clause uses a
nondeterministic collation, the planner's optimization of moving
HAVING clauses to WHERE can produce incorrect results if the HAVING
clause applies a stricter collation.

CREATE TABLE t (x TEXT COLLATE case_insensitive);
INSERT INTO t VALUES ('a'), ('A');

SELECT x, count(*) FROM t GROUP BY x HAVING x = 'a' COLLATE "C";

This returns count=1, but should return count=2.

The attached draft patch fixes this for HEAD by leveraging GROUP Vars
(Vars referencing RTE_GROUP) to detect collation conflicts on a
per-clause basis, so only unsafe clauses are kept in HAVING while safe
ones are still pushed. Please see the commit message for more
details.

For versions prior to v18, we do not have GROUP Vars. I wonder if we
can take a conservative approach: skipping the HAVING-to-WHERE
pushdown optimization entirely if any GROUP BY expression uses a
nondeterministic collation.

Thoughts and reviews are welcome.

- Richard

Attachments:

v1-0001-Fix-HAVING-to-WHERE-pushdown-with-nondeterministi.patchapplication/octet-stream; name=v1-0001-Fix-HAVING-to-WHERE-pushdown-with-nondeterministi.patchDownload+379-1
#2wenhui qiu
qiuwenhuifx@gmail.com
In reply to: Richard Guo (#1)
Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations

Hi Richard

+ if (OidIsValid(inputcollid))
+ {
+ List   *vars;
+
+ /*
+ * We use PVC_RECURSE_PLACEHOLDERS because PlaceHolderVars may have
+ * been introduced by pull_up_subqueries, and we need to look through
+ * them to find the underlying Vars.  We don't need to consider
+ * Aggrefs because clauses containing aggregates are already excluded
+ * from HAVING-to-WHERE pushdown by the contain_agg_clause check.
+ * Likewise, WindowFuncs are ignored since they cannot appear in a
+ * HAVING clause.
+ */
+ vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
+
+ foreach_node(Var, var, vars)
+ {
+ if (var->varno == *group_rtindex &&
+ OidIsValid(var->varcollid) &&
+ var->varcollid != inputcollid &&
+ !get_collation_isdeterministic(var->varcollid))
+ {
+ list_free(vars);
+ return true;
+ }
+ }
+
+ list_free(vars);
+ }
+
+ return expression_tree_walker(node, having_collation_conflict_walker,
+  group_rtindex);
+}

This might be overthinking, but I wonder if calling pull_var_clause() at
each walker step could introduce some overhead due to repeated subtree scans
,Should we add a test (SELECT x, count(*) FROM test3ci GROUP BY x HAVING
max(x) = 'abc' COLLATE case_sensitive;)

Thanks

On Tue, Mar 31, 2026 at 11:41 AM Richard Guo <guofenglinux@gmail.com> wrote:

Show quoted text

As briefly discussed on Discord, when a GROUP BY clause uses a
nondeterministic collation, the planner's optimization of moving
HAVING clauses to WHERE can produce incorrect results if the HAVING
clause applies a stricter collation.

CREATE TABLE t (x TEXT COLLATE case_insensitive);
INSERT INTO t VALUES ('a'), ('A');

SELECT x, count(*) FROM t GROUP BY x HAVING x = 'a' COLLATE "C";

This returns count=1, but should return count=2.

The attached draft patch fixes this for HEAD by leveraging GROUP Vars
(Vars referencing RTE_GROUP) to detect collation conflicts on a
per-clause basis, so only unsafe clauses are kept in HAVING while safe
ones are still pushed. Please see the commit message for more
details.

For versions prior to v18, we do not have GROUP Vars. I wonder if we
can take a conservative approach: skipping the HAVING-to-WHERE
pushdown optimization entirely if any GROUP BY expression uses a
nondeterministic collation.

Thoughts and reviews are welcome.

- Richard

#3Richard Guo
guofenglinux@gmail.com
In reply to: wenhui qiu (#2)
Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations

On Wed, Apr 1, 2026 at 11:19 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote:

+ vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
+
+ foreach_node(Var, var, vars)
+ {
+ if (var->varno == *group_rtindex &&
+ OidIsValid(var->varcollid) &&
+ var->varcollid != inputcollid &&
+ !get_collation_isdeterministic(var->varcollid))
+ {
+ list_free(vars);
+ return true;
+ }
+ }
+
+ list_free(vars);

This might be overthinking, but I wonder if calling pull_var_clause() at each walker step could introduce some overhead due to repeated subtree scans

That's a good point, but I doubt that it'd be an issue in practice.
HAVING clauses are typically very small expressions. Even in unusual
queries, the clause size is bounded by what a human writes, which is
negligible compared to the work the planner does elsewhere.

Maybe we can avoid this by calling pull_var_clause once at the top of
each clause and reusing that var list at every node. But that can
introduce false positives. The pre-pulled list contains all GROUP
Vars from the entire clause, but a given operator node only acts on
the vars in its own subtree.

- Richard

#4Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#1)
Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations

On Tue, Mar 31, 2026 at 12:41 PM Richard Guo <guofenglinux@gmail.com> wrote:

The attached draft patch fixes this for HEAD by leveraging GROUP Vars
(Vars referencing RTE_GROUP) to detect collation conflicts on a
per-clause basis, so only unsafe clauses are kept in HAVING while safe
ones are still pushed. Please see the commit message for more
details.

I noticed a bug in this patch. The pull_var_clause call in
having_collation_conflict_walker needs to recurse through Aggrefs,
since they can still be present in havingQual at this point and we
need to look through them to reach any GROUP Vars in their direct
arguments. v2 attached fixes this.

For versions prior to v18, we do not have GROUP Vars. I wonder if we
can take a conservative approach: skipping the HAVING-to-WHERE
pushdown optimization entirely if any GROUP BY expression uses a
nondeterministic collation.

I'm afraid this approach would regress performance for queries that
currently benefit from the optimization. But a proper pre-v18 fix
would require a different approach from the v18+ one, since GROUP Vars
don't exist on earlier branches. Given the absence of field reports,
I don't think the risk of carrying a different fix on stable branches
is justified. So I'm inclined to back-patch this fix to v18 only.

Any thoughts?

- Richard

Attachments:

v2-0001-Fix-HAVING-to-WHERE-pushdown-with-nondeterministi.patchapplication/octet-stream; name=v2-0001-Fix-HAVING-to-WHERE-pushdown-with-nondeterministi.patchDownload+379-1
#5Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#3)
Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations

On Thu, Apr 2, 2026 at 7:11 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Apr 1, 2026 at 11:19 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote:

+ vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
+
+ foreach_node(Var, var, vars)
+ {
+ if (var->varno == *group_rtindex &&
+ OidIsValid(var->varcollid) &&
+ var->varcollid != inputcollid &&
+ !get_collation_isdeterministic(var->varcollid))
+ {
+ list_free(vars);
+ return true;
+ }
+ }
+
+ list_free(vars);

This might be overthinking, but I wonder if calling pull_var_clause() at each walker step could introduce some overhead due to repeated subtree scans

That's a good point, but I doubt that it'd be an issue in practice.
HAVING clauses are typically very small expressions. Even in unusual
queries, the clause size is bounded by what a human writes, which is
negligible compared to the work the planner does elsewhere.

I was about to push the v2 patch, but I just can't shake off the
concern Wenhui Qiu raised about the repeated subtree scan. I still
don't have a concrete real-world case where a query has a large enough
HAVING clause for it to matter, but let's just be paranoid.

I think we can fix it easily. The current walker calls
pull_var_clause() at every collation-aware node, which re-walks the
subtree. The fix is to flip it inside out: walk top-down, push
inputcollids onto a LIFO stack, and at each GROUP Var check against
the stack. This way, we only need to walk the expression tree once.
Attached v3 does this.

v3 also fixes the RowCompareExpr case. Unlike the node types covered
by exprInputCollation(), RowCompareExpr carries per-column
inputcollids[] rather than a single inputcollid, so we need to descend
into each (largs[i], rargs[i]) pair with the matching collation pushed
onto the stack. Without this, a HAVING clause like:

HAVING ROW(x, 1) < ROW('ABC' COLLATE case_sensitive, 1)

over a case_insensitive group would give wrong results.

- Richard

Attachments:

v3-0001-Fix-HAVING-to-WHERE-pushdown-with-nondeterministi.patchapplication/octet-stream; name=v3-0001-Fix-HAVING-to-WHERE-pushdown-with-nondeterministi.patchDownload+468-1
#6Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#5)
Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations

On Thu, Apr 30, 2026 at 12:08 PM Richard Guo <guofenglinux@gmail.com> wrote:

I was about to push the v2 patch, but I just can't shake off the
concern Wenhui Qiu raised about the repeated subtree scan. I still
don't have a concrete real-world case where a query has a large enough
HAVING clause for it to matter, but let's just be paranoid.

I think we can fix it easily. The current walker calls
pull_var_clause() at every collation-aware node, which re-walks the
subtree. The fix is to flip it inside out: walk top-down, push
inputcollids onto a LIFO stack, and at each GROUP Var check against
the stack. This way, we only need to walk the expression tree once.
Attached v3 does this.

v3 also fixes the RowCompareExpr case. Unlike the node types covered
by exprInputCollation(), RowCompareExpr carries per-column
inputcollids[] rather than a single inputcollid, so we need to descend
into each (largs[i], rargs[i]) pair with the matching collation pushed
onto the stack. Without this, a HAVING clause like:

HAVING ROW(x, 1) < ROW('ABC' COLLATE case_sensitive, 1)

over a case_insensitive group would give wrong results.

I've committed this and back-patched it to v18. I was not
back-patching further because pre-v18 branches would need a very
different and more complex fix due to the lack of the RTE_GROUP
mechanism. I think it's too risky, and doesn't seem justified given
the absence of field reports.

- Richard