Incremental Sort Cost Estimation Instability

Started by Andrei Lepikhovover 1 year ago17 messages
#1Andrei Lepikhov
lepihov@gmail.com
1 attachment(s)

Hi,

While designing an improvement for the cost sort model, I discovered
that the query plan can vary if we slightly change the query text
without pushing semantic differences. See the example below:

CREATE TABLE test(x integer, y integer,z text);
INSERT INTO test (x,y) SELECT x, 1 FROM generate_series(1,1000000) AS x;
CREATE INDEX ON test(x);
CREATE INDEX ON test(y);
VACUUM ANALYZE;
SET max_parallel_workers_per_gather = 0;

First query:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t1.x,t1.y;

And the second one - just reverse the left and right sides of
expressions in the WHERE condition:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t2.y=t1.x AND t2.x=t1.y GROUP BY t1.x,t1.y;

You can see two different plans here:

GroupAggregate (cost=37824.89..37824.96 rows=1 width=16)
Group Key: t1.y, t1.x
-> Incremental Sort (cost=37824.89..37824.94 rows=2 width=8)
Sort Key: t1.y, t1.x
Presorted Key: t1.y
-> Merge Join (cost=0.85..37824.88 rows=1 width=8)
Merge Cond: (t1.y = t2.x)
Join Filter: (t2.y = t1.x)
-> Index Scan using test_y_idx on test t1
-> Index Scan using test_x_idx on test t2

GroupAggregate (cost=37824.89..37824.92 rows=1 width=16)
Group Key: t1.x, t1.y
-> Sort (cost=37824.89..37824.90 rows=1 width=8)
Sort Key: t1.x, t1.y
Sort Method: quicksort Memory: 25kB
-> Merge Join (cost=0.85..37824.88 rows=1 width=8)
Merge Cond: (t1.y = t2.x)
Join Filter: (t2.y = t1.x)
-> Index Scan using test_y_idx on test t1
-> Index Scan using test_x_idx on test t2

Don't mind for now that the best plan is to do IncrementalSort with
presorted key t1.x. Just pay attention to the fact that the plan has
altered without any valuable reason.
The cost_incremental_sort() routine causes such behaviour: it chooses
the expression to estimate the number of groups from the first
EquivalenceClass member that relies on the syntax.
I tried to invent a simple solution to fight this minor case. But the
most clear and straightforward way here is to save a reference to the
expression that triggered the PathKey creation inside the PathKey itself.
See the sketch of the patch in the attachment.
I'm not sure this instability is worth fixing this way, but the
dependence of the optimisation outcome on the query text looks buggy.

--
regards, Andrei Lepikhov

Attachments:

0001-Add-a-reference-to-the-expression-which-has-triggere.patchtext/plain; charset=UTF-8; name=0001-Add-a-reference-to-the-expression-which-has-triggere.patchDownload
From 8eb9998fe3306005d5067e53fef6a032515ebed5 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Wed, 26 Jun 2024 16:28:12 +0700
Subject: [PATCH] Add a reference to the expression which has triggered
 creation of this pathkey.

For the sake of consistency in number of groups estimation in incremental sort
we need to have an information about expression which caused the pathkey. It
doesn't guarantee correct estimation but at least estimation doesn't rely on
a SQL string.
---
 contrib/postgres_fdw/postgres_fdw.c   |  6 +++--
 src/backend/optimizer/path/costsize.c | 28 +++++++++++++++++++----
 src/backend/optimizer/path/pathkeys.c | 33 ++++++++++++++++++++++-----
 src/include/nodes/pathnodes.h         |  2 ++
 src/include/optimizer/paths.h         |  3 ++-
 5 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0bb9a5ae8f..da26d99fb1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -976,6 +976,7 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, useful_eclass_list)
 	{
 		EquivalenceClass *cur_ec = lfirst(lc);
+		EquivalenceMember *em;
 		PathKey    *pathkey;
 
 		/* If redundant with what we did above, skip it. */
@@ -988,14 +989,15 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
 			continue;
 
 		/* If no pushable expression for this rel, skip it. */
-		if (find_em_for_rel(root, cur_ec, rel) == NULL)
+		if ((em = find_em_for_rel(root, cur_ec, rel)) == NULL)
 			continue;
 
 		/* Looks like we can generate a pathkey, so let's do it. */
 		pathkey = make_canonical_pathkey(root, cur_ec,
 										 linitial_oid(cur_ec->ec_opfamilies),
 										 BTLessStrategyNumber,
-										 false);
+										 false,
+										 em->em_expr); /* TODO */
 		useful_pathkeys_list = lappend(useful_pathkeys_list,
 									   list_make1(pathkey));
 	}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ee23ed7835..a64f3e5797 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2038,21 +2038,41 @@ cost_incremental_sort(Path *path,
 	foreach(l, pathkeys)
 	{
 		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+
+#ifdef USE_ASSERT_CHECKING
+		ListCell *lc;
+
+		/* Be paranoid, but caring about performace don't check on prod */
+		foreach(lc, key->pk_eclass->ec_members)
+		{
+			if (find_ec_member_matching_expr(
+								key->pk_eclass, key->source_expr,
+								pull_varnos(root, (Node *) key->source_expr)))
+				break;
+		}
+		if (!lc)
+			/*
+			 * XXX: Can we build PathKey over derived expression and does it
+			 * trigger this error?
+			 */
+			elog(ERROR, "PathKey source expression must be a part of EQ class");
+#endif
+
+		/* Reassure ourselves no one created pathkeys alternative way */
+		Assert(key->source_expr != NULL);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) key->source_expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, key->source_expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 416fc4e240..d859e79208 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -54,7 +54,7 @@ static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first)
+					   int strategy, bool nulls_first, Expr *src_expr)
 {
 	PathKey    *pk;
 	ListCell   *lc;
@@ -89,6 +89,7 @@ make_canonical_pathkey(PlannerInfo *root,
 	pk->pk_opfamily = opfamily;
 	pk->pk_strategy = strategy;
 	pk->pk_nulls_first = nulls_first;
+	pk->source_expr = src_expr;
 
 	root->canon_pathkeys = lappend(root->canon_pathkeys, pk);
 
@@ -241,7 +242,7 @@ make_pathkey_from_sortinfo(PlannerInfo *root,
 
 	/* And finally we can find or create a PathKey node */
 	return make_canonical_pathkey(root, eclass, opfamily,
-								  strategy, nulls_first);
+								  strategy, nulls_first, expr);
 }
 
 /*
@@ -1117,7 +1118,8 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
 											   outer_ec,
 											   sub_pathkey->pk_opfamily,
 											   sub_pathkey->pk_strategy,
-											   sub_pathkey->pk_nulls_first);
+											   sub_pathkey->pk_nulls_first,
+											   (Expr *) outer_var);
 			}
 		}
 		else
@@ -1199,7 +1201,8 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
 													  outer_ec,
 													  sub_pathkey->pk_opfamily,
 													  sub_pathkey->pk_strategy,
-													  sub_pathkey->pk_nulls_first);
+													  sub_pathkey->pk_nulls_first,
+													  (Expr *) outer_var);
 					/* score = # of equivalence peers */
 					score = list_length(outer_ec->ec_members) - 1;
 					/* +1 if it matches the proper query_pathkeys item */
@@ -1643,6 +1646,7 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 	List	   *pathkeys = NIL;
 	int			nClauses = list_length(mergeclauses);
 	EquivalenceClass **ecs;
+	Expr			 **exprs;
 	int		   *scores;
 	int			necs;
 	ListCell   *lc;
@@ -1657,6 +1661,7 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 	 * duplicates) and their "popularity" scores.
 	 */
 	ecs = (EquivalenceClass **) palloc(nClauses * sizeof(EquivalenceClass *));
+	exprs = (Expr **) palloc(nClauses * sizeof(Expr *));
 	scores = (int *) palloc(nClauses * sizeof(int));
 	necs = 0;
 
@@ -1666,14 +1671,21 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 		EquivalenceClass *oeclass;
 		int			score;
 		ListCell   *lc2;
+		Expr	   *expr;
 
 		/* get the outer eclass */
 		update_mergeclause_eclasses(root, rinfo);
 
 		if (rinfo->outer_is_left)
+		{
 			oeclass = rinfo->left_ec;
+			expr = (Expr *) linitial(((OpExpr *) rinfo->clause)->args);
+		}
 		else
+		{
 			oeclass = rinfo->right_ec;
+			expr = (Expr *) lsecond(((OpExpr *) rinfo->clause)->args);
+		}
 
 		/* reject duplicates */
 		for (j = 0; j < necs; j++)
@@ -1697,6 +1709,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 		}
 
 		ecs[necs] = oeclass;
+		exprs[necs] = expr;
+
 		scores[necs] = score;
 		necs++;
 	}
@@ -1798,7 +1812,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 										 ec,
 										 linitial_oid(ec->ec_opfamilies),
 										 BTLessStrategyNumber,
-										 false);
+										 false,
+										 exprs[best_j]);
 		/* can't be redundant because no duplicate ECs */
 		Assert(!pathkey_is_redundant(pathkey, pathkeys));
 		pathkeys = lappend(pathkeys, pathkey);
@@ -1852,18 +1867,23 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
 		EquivalenceClass *oeclass;
 		EquivalenceClass *ieclass;
 		PathKey    *pathkey;
+		Expr	   *src_expr;
 
 		update_mergeclause_eclasses(root, rinfo);
 
+		Assert(IsA(rinfo->clause, OpExpr) && rinfo->orclause == NULL);
+
 		if (rinfo->outer_is_left)
 		{
 			oeclass = rinfo->left_ec;
 			ieclass = rinfo->right_ec;
+			src_expr = (Expr *) lsecond(((OpExpr *) rinfo->clause)->args);
 		}
 		else
 		{
 			oeclass = rinfo->right_ec;
 			ieclass = rinfo->left_ec;
+			src_expr = (Expr *) linitial(((OpExpr *) rinfo->clause)->args);
 		}
 
 		/* outer eclass should match current or next pathkeys */
@@ -1891,7 +1911,8 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
 											 ieclass,
 											 opathkey->pk_opfamily,
 											 opathkey->pk_strategy,
-											 opathkey->pk_nulls_first);
+											 opathkey->pk_nulls_first,
+											 src_expr);
 
 		/*
 		 * Don't generate redundant pathkeys (which can happen if multiple
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..7e99725e00 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1469,6 +1469,8 @@ typedef struct PathKey
 	Oid			pk_opfamily;	/* btree opfamily defining the ordering */
 	int			pk_strategy;	/* sort direction (ASC or DESC) */
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
+
+	Expr	   *source_expr;	/* Expression, which triggered this creation */
 } PathKey;
 
 /*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c0224a..4b99229ebf 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -264,7 +264,8 @@ extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
 extern List *append_pathkeys(List *target, List *source);
 extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   EquivalenceClass *eclass, Oid opfamily,
-									   int strategy, bool nulls_first);
+									   int strategy, bool nulls_first,
+									   Expr *src_expr);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
-- 
2.45.2

#2David Rowley
dgrowleyml@gmail.com
In reply to: Andrei Lepikhov (#1)
Re: Incremental Sort Cost Estimation Instability

On Thu, 27 Jun 2024 at 03:00, Andrei Lepikhov <lepihov@gmail.com> wrote:

I tried to invent a simple solution to fight this minor case. But the
most clear and straightforward way here is to save a reference to the
expression that triggered the PathKey creation inside the PathKey itself.
See the sketch of the patch in the attachment.
I'm not sure this instability is worth fixing this way, but the
dependence of the optimisation outcome on the query text looks buggy.

I don't think that's going to work as that'll mean it'll just choose
whichever expression was used when the PathKey was first created. For
your example query, both PathKey's are first created for the GROUP BY
clause in standard_qp_callback(). I only have to change the GROUP BY
in your query to use the equivalent column in the other table to get
it to revert back to the plan you complained about.

postgres=# EXPLAIN (costs off) SELECT count(*) FROM test t1, test t2
WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t2.y,t2.x;
QUERY PLAN
----------------------------------------------------------
GroupAggregate
Group Key: t2.y, t2.x
-> Sort
Sort Key: t2.y, t2.x
-> Merge Join
Merge Cond: (t1.y = t2.x)
Join Filter: (t2.y = t1.x)
-> Index Scan using test_y_idx on test t1
-> Index Scan using test_x_idx on test t2
(9 rows)

Maybe doing something with estimate_num_groups() to find the
EquivalenceClass member with the least distinct values would be
better. I just can't think how that could be done in a performant way.

David

#3Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#2)
Re: Incremental Sort Cost Estimation Instability

On 12/9/2024 03:05, David Rowley wrote:

On Thu, 27 Jun 2024 at 03:00, Andrei Lepikhov <lepihov@gmail.com> wrote:

I tried to invent a simple solution to fight this minor case. But the
most clear and straightforward way here is to save a reference to the
expression that triggered the PathKey creation inside the PathKey itself.
See the sketch of the patch in the attachment.
I'm not sure this instability is worth fixing this way, but the
dependence of the optimisation outcome on the query text looks buggy.

I don't think that's going to work as that'll mean it'll just choose
whichever expression was used when the PathKey was first created. For
your example query, both PathKey's are first created for the GROUP BY
clause in standard_qp_callback(). I only have to change the GROUP BY
in your query to use the equivalent column in the other table to get
it to revert back to the plan you complained about.

Yes, it is true. It is not ideal solution so far - looking for better ideas.

Maybe doing something with estimate_num_groups() to find the
EquivalenceClass member with the least distinct values would be
better. I just can't think how that could be done in a performant way.

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

--
regards, Andrei Lepikhov

#4David Rowley
dgrowleyml@gmail.com
In reply to: Andrei Lepikhov (#3)
Re: Incremental Sort Cost Estimation Instability

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

David

#5Tomas Vondra
tomas@vondra.me
In reply to: David Rowley (#4)
Re: Incremental Sort Cost Estimation Instability

On 9/12/24 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

David

How large can the cost difference get? My assumption was it's correlated
with how different the ndistincts are for the two sides, so I tried

CREATE TABLE t1(x integer, y integer,z text);
CREATE TABLE t2(x integer, y integer,z text);

INSERT INTO t1 (x,y) SELECT x, 1
FROM generate_series(1,1000000) AS x;
INSERT INTO t2 (x,y) SELECT mod(x,1000), 1
FROM generate_series(1,1000000) AS x;

CREATE INDEX ON t1(x);
CREATE INDEX ON t2(x);
CREATE INDEX ON t1(y);
CREATE INDEX ON t2(y);

VACUUM ANALYZE;

Which changes the ndistinct for t2.x from 1M to 1k. I've expected the
cost difference to get much larger, but in it does not really change:

GroupAggregate (cost=38.99..37886.88 rows=992 width=16) (actual rows=1
loops=1)

GroupAggregate (cost=37874.26..37904.04 rows=992 width=16) (actual
rows=1 loops=1)

That is pretty significant - the total cost difference is tiny, I'd even
say it does not matter in practice (seems well within possible impact of
collecting a different random sample).

But the startup cost changes in rather absurd way, while the rest of the
plan is exactly the same. We even know this:

-> Incremental Sort (cost=38.99..37869.52 rows=992 width=8)
Sort Key: t1.x, t1.y
Presorted Key: t1.x

in both cases. There's literally no other difference between these plans
visible in the explain ...

I'm not sure how to fix this, but it seems estimate_num_groups() needs
to do things differently. And I agree looking for the minimum ndistinct
seems like the right approach. but doesn't estimate_num_groups()
supposed to already do that? The comment says:

* 3. If the list contains Vars of different relations that are known equal
* due to equivalence classes, then drop all but one of the Vars from each
* known-equal set, keeping the one with smallest estimated # of values
* (since the extra values of the others can't appear in joined rows).
* Note the reason we only consider Vars of different relations is that
* if we considered ones of the same rel, we'd be double-counting the
* restriction selectivity of the equality in the next step.

I haven't debugged this, but how come this doesn't do the trick?

regards

--
Tomas Vondra

#6Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#4)
Re: Incremental Sort Cost Estimation Instability

On 12/9/2024 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

Thanks for your efforts! Your idea looks more stable and applicable than
my patch.
BTW, it could still provide wrong ndistinct estimations if we choose a
sorting operator under clauses mentioned in the EquivalenceClass.
However, this thread's primary intention is to stabilize query plans, so
I'll try to implement your idea.

The second reason was to distinguish sortings by cost (see proposal [1]/messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com)
because sometimes it could help to save CPU cycles on comparisons.
Having a lot of sort/grouping queries with only sporadic joins, I see
how profitable it could sometimes be - text or numeric grouping over
mostly Cartesian join may be painful without fine tuned sorting.

[1]: /messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com
/messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com

--
regards, Andrei Lepikhov

#7Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#4)
1 attachment(s)
Re: Incremental Sort Cost Estimation Instability

On 12/9/2024 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

Finally, I implemented this approach in code (see attachment).
The effectiveness may be debatable, but general approach looks even
better than previous one.
Change status to 'Need review'.

--
regards, Andrei Lepikhov

Attachments:

v2-0001-Stabilize-incremental-sort-estimation.patchtext/plain; charset=UTF-8; name=v2-0001-Stabilize-incremental-sort-estimation.patchDownload
From 3d31893eeb3ed96c7c9292c6961a0455c2a3af4c Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 23 Sep 2024 14:30:28 +0200
Subject: [PATCH v2] Stabilize incremental sort estimation.

Carefully identify an expression that can represent the path key on sort
estimation. Columns may have different distinct estimations that can trigger
wrong cost estimations and choice of sort strategy.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find
columns that are used under the estimating sort operator and chooses the most
negligible ndistinct value.
---
 src/backend/optimizer/path/costsize.c | 52 ++++++++++++++++++++++++---
 src/backend/optimizer/util/pathnode.c |  1 +
 src/include/optimizer/cost.h          |  1 +
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..9c08f6b80c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1984,6 +1984,50 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 	*run_cost = cpu_operator_cost * tuples;
 }
 
+static Expr *
+identify_sort_expression(PlannerInfo *root, PathKey *key, Relids relids)
+{
+	EquivalenceMember  *candidate = linitial(key->pk_eclass->ec_members);
+	double 				ndistinct = -1.0; /* Not defined */
+
+	if (root == NULL || list_length(key->pk_eclass->ec_members) == 1)
+		/* Fast path */
+		return candidate->em_expr;
+
+	foreach_node(EquivalenceMember, em, key->pk_eclass->ec_members)
+	{
+		VariableStatData	vardata;
+		bool				isdefault = true;
+		double				ndist;
+
+		/* Sorting over single partition? */
+		if (em->em_is_child && bms_num_members(relids) > 1)
+			continue;
+
+		if (!bms_is_subset(em->em_relids, relids))
+			/*
+			 * Avoid expressions not based on a table column. At least, the
+			 * candidate value already initialised as the first EC member.
+			 */
+			continue;
+
+		/* Let's check candidate's ndistinct value */
+		examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+		if (HeapTupleIsValid(vardata.statsTuple))
+			ndist = get_variable_numdistinct(&vardata, &isdefault);
+		ReleaseVariableStats(vardata);
+
+		if (ndistinct < 0.0 || (!isdefault && ndist < ndistinct))
+		{
+			candidate = em;
+			ndistinct = ndist;
+		}
+	}
+
+	Assert(candidate != NULL);
+	return candidate->em_expr;
+}
+
 /*
  * cost_incremental_sort
  * 	Determines and returns the cost of sorting a relation incrementally, when
@@ -1999,6 +2043,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 void
 cost_incremental_sort(Path *path,
 					  PlannerInfo *root, List *pathkeys, int presorted_keys,
+					  Relids relids,
 					  int input_disabled_nodes,
 					  Cost input_startup_cost, Cost input_total_cost,
 					  double input_tuples, int width, Cost comparison_cost, int sort_mem,
@@ -2053,21 +2098,20 @@ cost_incremental_sort(Path *path,
 	foreach(l, pathkeys)
 	{
 		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		Expr	   *expr = identify_sort_expression(root, key, relids);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..8ed566b1bc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3055,6 +3055,7 @@ create_incremental_sort_path(PlannerInfo *root,
 
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
+						  subpath->parent->relids,
 						  subpath->disabled_nodes,
 						  subpath->startup_cost,
 						  subpath->total_cost,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944..ebd5f2c5d9 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -114,6 +114,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 					  double limit_tuples);
 extern void cost_incremental_sort(Path *path,
 								  PlannerInfo *root, List *pathkeys, int presorted_keys,
+								  Relids relids,
 								  int input_disabled_nodes,
 								  Cost input_startup_cost, Cost input_total_cost,
 								  double input_tuples, int width, Cost comparison_cost, int sort_mem,
-- 
2.46.1

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: Tomas Vondra (#5)
Re: Incremental Sort Cost Estimation Instability

On 12/9/2024 16:57, Tomas Vondra wrote:

On 9/12/24 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

I'm not sure how to fix this, but it seems estimate_num_groups() needs
to do things differently. And I agree looking for the minimum ndistinct
seems like the right approach. but doesn't estimate_num_groups()
supposed to already do that? The comment says:

I've rewritten the code in the previous email. It looks like we can try
to rewrite estimate_num_groups to do it more effectively, but I haven't
done it yet.
Regarding the tiny change in the cost, my initial reason was to teach
cost_sort to differ sort orderings: begin by considering the number of
columns in the cost estimation and then consider the distinct estimation
of the first column.
BTW, it was triggered by user reports, where a slight change in the
balance between MergeAppend/GatherMerge/Sort/IncrementalSort (or columns
order) could give significant profit. Especially when grouping millions
of rows in 2-4 bytea columns.

--
regards, Andrei Lepikhov

#9Andrei Lepikhov
lepihov@gmail.com
In reply to: Tomas Vondra (#5)
Re: Incremental Sort Cost Estimation Instability

On 12/9/2024 16:57, Tomas Vondra wrote:

On 9/12/24 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

but doesn't estimate_num_groups()
supposed to already do that? The comment says:

* 3. If the list contains Vars of different relations that are known equal
* due to equivalence classes, then drop all but one of the Vars from each
* known-equal set, keeping the one with smallest estimated # of values
* (since the extra values of the others can't appear in joined rows).
* Note the reason we only consider Vars of different relations is that
* if we considered ones of the same rel, we'd be double-counting the
* restriction selectivity of the equality in the next step.

I haven't debugged this, but how come this doesn't do the trick?

I've got your point now.
Unfortunately, this comment says that estimate_num_groups removes
duplicates from the list of grouping expressions (see
exprs_known_equal). But it doesn't discover em_members to find the
most-fitted clause for each grouping position.

--
regards, Andrei Lepikhov

#10Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#7)
1 attachment(s)
Re: Incremental Sort Cost Estimation Instability

On 9/23/24 20:02, Andrei Lepikhov wrote:

On 12/9/2024 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass.  If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate?  (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

Finally, I implemented this approach in code (see attachment).
The effectiveness may be debatable, but general approach looks even
better than previous one.
Change status to 'Need review'.

Minor change to make compiler and cfbot happy.

--
regards, Andrei Lepikhov

Attachments:

v3-0001-Stabilize-incremental-sort-estimation.patchtext/x-patch; charset=UTF-8; name=v3-0001-Stabilize-incremental-sort-estimation.patchDownload
From bdaf7cd46eb10353b6cbaf5f22d1ea835881db24 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 23 Sep 2024 14:30:28 +0200
Subject: [PATCH v3] Stabilize incremental sort estimation.

Carefully identify an expression that can represent the path key on sort
estimation. Columns may have different distinct estimations that can trigger
wrong cost estimations and choice of sort strategy.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find
columns that are used under the estimating sort operator and chooses the most
negligible ndistinct value.
---
 src/backend/optimizer/path/costsize.c | 52 ++++++++++++++++++++++++---
 src/backend/optimizer/util/pathnode.c |  1 +
 src/include/optimizer/cost.h          |  1 +
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..01f166f504 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1984,6 +1984,50 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 	*run_cost = cpu_operator_cost * tuples;
 }
 
+static Expr *
+identify_sort_expression(PlannerInfo *root, PathKey *key, Relids relids)
+{
+	EquivalenceMember  *candidate = linitial(key->pk_eclass->ec_members);
+	double 				ndistinct = -1.0; /* Not defined */
+
+	if (root == NULL || list_length(key->pk_eclass->ec_members) == 1)
+		/* Fast path */
+		return candidate->em_expr;
+
+	foreach_node(EquivalenceMember, em, key->pk_eclass->ec_members)
+	{
+		VariableStatData	vardata;
+		bool				isdefault = true;
+		double				ndist = -1.0;
+
+		/* Sorting over single partition? */
+		if (em->em_is_child && bms_num_members(relids) > 1)
+			continue;
+
+		if (!bms_is_subset(em->em_relids, relids))
+			/*
+			 * Avoid expressions not based on a table column. At least, the
+			 * candidate value already initialised as the first EC member.
+			 */
+			continue;
+
+		/* Let's check candidate's ndistinct value */
+		examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+		if (HeapTupleIsValid(vardata.statsTuple))
+			ndist = get_variable_numdistinct(&vardata, &isdefault);
+		ReleaseVariableStats(vardata);
+
+		if (ndistinct < 0.0 || (!isdefault && ndist < ndistinct))
+		{
+			candidate = em;
+			ndistinct = ndist;
+		}
+	}
+
+	Assert(candidate != NULL);
+	return candidate->em_expr;
+}
+
 /*
  * cost_incremental_sort
  * 	Determines and returns the cost of sorting a relation incrementally, when
@@ -1999,6 +2043,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 void
 cost_incremental_sort(Path *path,
 					  PlannerInfo *root, List *pathkeys, int presorted_keys,
+					  Relids relids,
 					  int input_disabled_nodes,
 					  Cost input_startup_cost, Cost input_total_cost,
 					  double input_tuples, int width, Cost comparison_cost, int sort_mem,
@@ -2053,21 +2098,20 @@ cost_incremental_sort(Path *path,
 	foreach(l, pathkeys)
 	{
 		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		Expr	   *expr = identify_sort_expression(root, key, relids);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..8ed566b1bc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3055,6 +3055,7 @@ create_incremental_sort_path(PlannerInfo *root,
 
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
+						  subpath->parent->relids,
 						  subpath->disabled_nodes,
 						  subpath->startup_cost,
 						  subpath->total_cost,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944..ebd5f2c5d9 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -114,6 +114,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 					  double limit_tuples);
 extern void cost_incremental_sort(Path *path,
 								  PlannerInfo *root, List *pathkeys, int presorted_keys,
+								  Relids relids,
 								  int input_disabled_nodes,
 								  Cost input_startup_cost, Cost input_total_cost,
 								  double input_tuples, int width, Cost comparison_cost, int sort_mem,
-- 
2.39.5

#11Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#10)
1 attachment(s)
Re: Incremental Sort Cost Estimation Instability

On 10/8/24 11:33, Andrei Lepikhov wrote:

On 9/23/24 20:02, Andrei Lepikhov wrote:

On 12/9/2024 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com>

Minor change to make compiler and cfbot happy

Now, this thread looks connected to the [1]/messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com -- regards, Andrei Lepikhov. However, it still has
independent profit, which can be discussed separately.
After the introduction of the em->em_ndistinct cache, I played around
with the idea of letting the estimate_num_groups use this cache.
Occasionally found out that we have one more instability case like the
following:

DROP TABLE IF EXISTS test;
CREATE TABLE test (x int, y int, z int);
INSERT INTO test (x,y,z) (SELECT random()*1E5, random()*2, random() FROM
generate_series(1,1e4));
VACUUM ANALYZE test;

EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY x,y;
EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY y,x;

Here, you can see that depending on the initial order of grouping,
Postgres chooses different columns for grouping. Doing that causes
different estimations - one of them is definitely wrong:

GroupAggregate (cost=181.41..182.29 rows=50 width=16)
GroupAggregate (cost=181.41..181.82 rows=3 width=16)

That happens because when estimating the number of groups, Postgres
doesn't consider EquivalenceClass, which can let him correct group
estimation at a low price.
It may be done inside the make_pathkeys_for_sortclauses_extended by
choosing a column with a lower number of distinct, but IMO, it is better
to do it at the moment of the number of groups estimation.

Thoughts? Is it a real issue or just a non-practical corner case?

The new version of the patch is attached.

[1]: /messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com -- regards, Andrei Lepikhov
/messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com
--
regards, Andrei Lepikhov

Attachments:

v4-0001-Stabilise-incremental-sort-cost-calculation.patchtext/x-patch; charset=UTF-8; name=v4-0001-Stabilise-incremental-sort-cost-calculation.patchDownload
From 5eb884cbbd9c2e356d5d855da46d7e62d101b8b9 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 29 Oct 2024 08:49:33 +0700
Subject: [PATCH v2 1/4] Stabilise incremental sort cost calculation.

Carefully identify a column/expression that can represent the path key in cost
calculation of specific sort operator. Columns may have different numbers of
distinct values. That's why the order of columns in the sort operation may
impact number of the comparison function's calls.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find an
expression that chooses the most negligible ndistinct value.

TODO: Filtering EC members, external to this sort operator is not a big deal.
But in that case it would be necessary to pass underlying relids to cost
calculation routine that would cause the API change. So, here we stay as simple
as possible.

Add into EquivalenceMember the number of distinct values - em_ndistinct.
It may be additionally used later in groups number estimations.
---
 src/backend/optimizer/path/costsize.c         | 72 +++++++++++++++++--
 src/backend/optimizer/path/equivclass.c       |  1 +
 src/include/nodes/pathnodes.h                 |  2 +
 .../regress/expected/incremental_sort.out     | 51 +++++++++++++
 src/test/regress/sql/incremental_sort.sql     | 31 ++++++++
 5 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df7..686d5883d1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -203,6 +203,8 @@ static int32 get_expr_width(PlannerInfo *root, const Node *expr);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
+static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
+												 EquivalenceClass *ec);
 
 
 /*
@@ -2052,22 +2054,21 @@ cost_incremental_sort(Path *path,
 	 */
 	foreach(l, pathkeys)
 	{
-		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		PathKey			   *key = (PathKey *) lfirst(l);
+		EquivalenceMember  *em = identify_sort_ecmember(root, key->pk_eclass);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) em->em_expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, em->em_expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
@@ -6604,3 +6605,64 @@ compute_gather_rows(Path *path)
 
 	return clamp_row_est(path->rows * get_parallel_divisor(path));
 }
+
+/*
+ * Find suitable member of the equivalence class.
+ * Passing through the list of EC members find the member with minimum of
+ * distinct values. Cache estimated number of distincts in the em_ndistinct
+ * field of each member.
+ */
+static EquivalenceMember *
+identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
+{
+	EquivalenceMember  *candidate = linitial(ec->ec_members);
+
+	if (root == NULL)
+		/* Fast path */
+		return candidate;
+
+	foreach_node(EquivalenceMember, em, ec->ec_members)
+	{
+		VariableStatData	vardata;
+
+		if (em->em_ndistinct == 0.)
+			/* Nothing helpful */
+			continue;
+
+		if (em->em_is_child || em->em_is_const || bms_is_empty(em->em_relids) ||
+			bms_is_member(0, em->em_relids))
+		{
+			em->em_ndistinct = 0.;
+			continue;
+		}
+
+		if (em->em_ndistinct < 0.)
+		{
+			bool	isdefault = true;
+			double	ndist = 0.;
+
+			/* Let's check candidate's ndistinct value */
+			examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+			if (HeapTupleIsValid(vardata.statsTuple))
+				ndist = get_variable_numdistinct(&vardata, &isdefault);
+			ReleaseVariableStats(vardata);
+
+			if (isdefault)
+			{
+				em->em_ndistinct = 0.;
+				continue;
+			}
+
+			em->em_ndistinct = ndist;
+		}
+
+		Assert(em->em_ndistinct > 0.);
+
+		if (candidate->em_ndistinct == 0. ||
+			em->em_ndistinct < candidate->em_ndistinct)
+			candidate = em;
+	}
+
+	Assert(candidate != NULL);
+	return candidate;
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index fae137dd82..3552614502 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -525,6 +525,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
 	em->em_datatype = datatype;
 	em->em_jdomain = jdomain;
 	em->em_parent = parent;
+	em->em_ndistinct = -1.0;
 
 	if (bms_is_empty(relids))
 	{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..4ef5256a7f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1448,6 +1448,8 @@ typedef struct EquivalenceMember
 	JoinDomain *em_jdomain;		/* join domain containing the source clause */
 	/* if em_is_child is true, this links to corresponding EM for top parent */
 	struct EquivalenceMember *em_parent pg_node_attr(read_write_ignore);
+
+	double em_ndistinct; /* cached value of ndistinct: 0- default value or 'unknown'; -1 - not defined yet */
 } EquivalenceMember;
 
 /*
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 2df7a5db12..409eb8a4b8 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1722,3 +1722,54 @@ order by t1.four, t1.two limit 1;
                ->  Seq Scan on tenk1 t2
 (12 rows)
 
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+SET enable_hashjoin = 'off';
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 98b20e17e1..af4d145395 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -298,3 +298,34 @@ explain (costs off)
 select * from
   (select * from tenk1 order by four) t1 join tenk1 t2 on t1.four = t2.four and t1.two = t2.two
 order by t1.four, t1.two limit 1;
+
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+
+SET enable_hashjoin = 'off';
+
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
-- 
2.39.5

#12Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Andrei Lepikhov (#11)
Re: Incremental Sort Cost Estimation Instability

Hi!

On 07.11.2024 08:57, Andrei Lepikhov wrote:

On 10/8/24 11:33, Andrei Lepikhov wrote:

On 9/23/24 20:02, Andrei Lepikhov wrote:

On 12/9/2024 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com>

Minor change to make compiler and cfbot happy

Now, this thread looks connected to the [1]. However, it still has
independent profit, which can be discussed separately.
After the introduction of the em->em_ndistinct cache, I played around
with the idea of letting the estimate_num_groups use this cache.
Occasionally found out  that we have one more instability case like
the following:

DROP TABLE IF EXISTS test;
CREATE TABLE test (x int, y int, z int);
INSERT INTO test (x,y,z) (SELECT random()*1E5, random()*2, random()
FROM generate_series(1,1e4));
VACUUM ANALYZE test;

EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY x,y;
EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY y,x;

Here, you can see that depending on the initial order of grouping,
Postgres chooses different columns for grouping. Doing that causes
different estimations - one of them is definitely wrong:

GroupAggregate  (cost=181.41..182.29 rows=50 width=16)
GroupAggregate  (cost=181.41..181.82 rows=3 width=16)

That happens because when estimating the number of groups, Postgres
doesn't consider EquivalenceClass, which can let him correct group
estimation at a low price.
It may be done inside the make_pathkeys_for_sortclauses_extended by
choosing a column with a lower number of distinct, but IMO, it is
better to do it at the moment of the number of groups estimation.

Thoughts? Is it a real issue or just a non-practical corner case?

The new version of the patch is attached.

[1]
/messages/by-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com

But you haven’t considered the case when you need to use non-cached
values, for example, if ndistinct has already changed. Look, here x has
a minimum ndistinct, and then column z:

alena@postgres=# delete from test;
DELETE 10000
alena@postgres=# INSERT INTO test (x,y,z) (SELECT *id%3*, id*2, id FROM
generate_series(1,1e4) as id);
INSERT 0 10000
alena@postgres=# EXPLAIN SELECT * FROM test where x=y ORDER BY x,y,z;
                          QUERY PLAN
--------------------------------------------------------------
 Sort  (cost=196.88..197.02 rows=56 width=12)
*Sort Key: x, z*
   ->  Seq Scan on test  (cost=0.00..195.25 rows=56 width=12)
         Filter: (x = y)
(4 rows)

alena@postgres=# delete from test;
DELETE 10000
alena@postgres=# INSERT INTO test (x,y,z) (SELECT id, id*2, *id%3* FROM
generate_series(1,1e4) as id);

INSERT 0 10000
alena@postgres=# vacuum analyze;
VACUUM
alena@postgres=# EXPLAIN SELECT * FROM test where x=y ORDER BY x,y,z;
                          QUERY PLAN
--------------------------------------------------------------
 Sort  (cost=235.41..235.54 rows=50 width=12)
*Sort Key: x, z*
   ->  Seq Scan on test  (cost=0.00..234.00 rows=50 width=12)
         Filter: (x = y)
(4 rows)

but the order of the columns does not change, as you can see.

--
Regards,
Alena Rybakina
Postgres Professional

#13Andrei Lepikhov
lepihov@gmail.com
In reply to: Alena Rybakina (#12)
Re: Incremental Sort Cost Estimation Instability

On 11/7/24 18:06, Alena Rybakina wrote:

On 07.11.2024 08:57, Andrei Lepikhov wrote:

That happens because when estimating the number of groups, Postgres
doesn't consider EquivalenceClass, which can let him correct group
estimation at a low price.
It may be done inside the make_pathkeys_for_sortclauses_extended by
choosing a column with a lower number of distinct, but IMO, it is
better to do it at the moment of the number of groups estimation.

Thoughts? Is it a real issue or just a non-practical corner case?

The new version of the patch is attached.

[1] https://www.postgresql.org/message-id/
flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

But you haven’t considered the case when you need to use non-cached
values, for example, if ndistinct has already changed. Look, here x has
a minimum ndistinct, and then column z:

but the order of the columns does not change, as you can see.

I'm unsure what you mean by talking about 'cached value' or 'changed
ndistinct' even slightly.
Also, I don't understand the issue you tried to show with your examples.
My point was that an equality expression can be used to modify
statistics-based decisions on the number of groups. Look:

A.x, distincts = 1000
A.y, distincts = 10

After the filter 'A.x=A.y' it is impossible to get more than 10 groups
on the A.x as well as on the A.y column. So, we have a tool to correct
the estimation considering equivalence classes.

--
regards, Andrei Lepikhov

#14Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#11)
1 attachment(s)
Re: Incremental Sort Cost Estimation Instability

Hi,

Revising the patch I found out that it may make sense to teach
estimate_num_groups to process a PathKey node inside the presortedExprs
list.
In this case it can pass through the EquivalenceClass members and choose
correct number of distinct values.
Here is a sketch of the solution (see in attachment).

--
regards, Andrei Lepikhov

Attachments:

0001-Improve-group-number-estimation.patchtext/x-patch; charset=UTF-8; name=0001-Improve-group-number-estimation.patchDownload
From 3d97c3aa36170460a82e66a06af13f95ed4a37c0 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 29 Oct 2024 08:49:33 +0700
Subject: [PATCH] Improve group number estimation.

Estimating GROUP BY x optimisation employs a distinct statistic for column x.
But if we have an expression like 'x=y' somewhere down the query tree, the
number of different values can't be more than the smaller distinct value on
columns 'x' and 'y'. That means it is possible to correct the estimation with
knowledge provided by the equivalence class.

In this commit, the estimate_num_groups routine is changed to include PathKey
nodes in the presortedExprs list. With the PathKey node, we can pass through
its equivalence class members and correct the distinct estimation.

To avoid multiple calls on statistic tuples, the em_ndistinct cache field
is introduced.
---
 src/backend/optimizer/path/costsize.c         | 40 ++-------
 src/backend/optimizer/path/equivclass.c       |  1 +
 src/backend/utils/adt/selfuncs.c              | 87 ++++++++++++++++++-
 src/include/nodes/pathnodes.h                 |  2 +
 .../regress/expected/incremental_sort.out     | 51 +++++++++++
 src/test/regress/sql/incremental_sort.sql     | 31 +++++++
 6 files changed, 180 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df7..fad63f694d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2008,13 +2008,12 @@ cost_incremental_sort(Path *path,
 				run_cost,
 				input_run_cost = input_total_cost - input_startup_cost;
 	double		group_tuples,
-				input_groups;
+				input_groups,
+				estimated_groups;
 	Cost		group_startup_cost,
 				group_run_cost,
 				group_input_run_cost;
 	List	   *presortedExprs = NIL;
-	ListCell   *l;
-	bool		unknown_varno = false;
 
 	Assert(presorted_keys > 0 && presorted_keys < list_length(pathkeys));
 
@@ -2025,9 +2024,6 @@ cost_incremental_sort(Path *path,
 	if (input_tuples < 2.0)
 		input_tuples = 2.0;
 
-	/* Default estimate of number of groups, capped to one group per row. */
-	input_groups = Min(input_tuples, DEFAULT_NUM_DISTINCT);
-
 	/*
 	 * Extract presorted keys as list of expressions.
 	 *
@@ -2050,33 +2046,15 @@ cost_incremental_sort(Path *path,
 	 * anyway - from that standpoint the DEFAULT_NUM_DISTINCT is defensive
 	 * while maintaining lower startup cost.
 	 */
-	foreach(l, pathkeys)
-	{
-		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
-
-		/*
-		 * Check if the expression contains Var with "varno 0" so that we
-		 * don't call estimate_num_groups in that case.
-		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
-		{
-			unknown_varno = true;
-			break;
-		}
+	presortedExprs = list_copy_head(pathkeys, presorted_keys);
 
-		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
-
-		if (foreach_current_index(l) + 1 >= presorted_keys)
-			break;
-	}
-
-	/* Estimate the number of groups with equal presorted keys. */
-	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+	estimated_groups = estimate_num_groups(root, presortedExprs, input_tuples,
 										   NULL, NULL);
+	if (estimated_groups > 0.0)
+		input_groups = estimated_groups;
+	else
+		/* Default estimate of number of groups, capped to one group per row. */
+		input_groups = Min(input_tuples, DEFAULT_NUM_DISTINCT);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index fae137dd82..3552614502 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -525,6 +525,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
 	em->em_datatype = datatype;
 	em->em_jdomain = jdomain;
 	em->em_parent = parent;
+	em->em_ndistinct = -1.0;
 
 	if (bms_is_empty(relids))
 	{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 08fa6774d9..e49101f50e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -213,6 +213,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 										 MemoryContext outercontext,
 										 Datum *endpointDatum);
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
+static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
+												 EquivalenceClass *ec);
 
 
 /*
@@ -3458,12 +3460,34 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	i = 0;
 	foreach(l, groupExprs)
 	{
-		Node	   *groupexpr = (Node *) lfirst(l);
+		Node	   *node = (Node *) lfirst(l);
+		Node	   *groupexpr;
 		double		this_srf_multiplier;
 		VariableStatData vardata;
 		List	   *varshere;
 		ListCell   *l2;
 
+		/* Find an expression beforehand */
+		if (IsA(node, PathKey))
+		{
+			PathKey			   *key = (PathKey *) node;
+			EquivalenceMember  *em = identify_sort_ecmember(root, key->pk_eclass);
+
+			/*
+			 * Check if the expression contains Var with "varno 0" so that we
+			 * don't call estimate_num_groups in that case.
+			 */
+			if (bms_is_member(0, pull_varnos(root, (Node *) em->em_expr)))
+			{
+				/* Return 'unknown' value */
+				return 0.0;
+			}
+
+			groupexpr = (Node *) em->em_expr;
+		}
+		else
+			groupexpr = node;
+
 		/* is expression in this grouping set? */
 		if (pgset && !list_member_int(*pgset, i++))
 			continue;
@@ -8197,3 +8221,64 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+/*
+ * Find suitable member of the equivalence class.
+ * Passing through the list of EC members find the member with minimum of
+ * distinct values. Cache estimated number of distincts in the em_ndistinct
+ * field of each member.
+ */
+static EquivalenceMember *
+identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
+{
+	EquivalenceMember  *candidate = linitial(ec->ec_members);
+
+	if (root == NULL)
+		/* Fast path */
+		return candidate;
+
+	foreach_node(EquivalenceMember, em, ec->ec_members)
+	{
+		VariableStatData	vardata;
+
+		if (em->em_ndistinct == 0.)
+			/* Nothing helpful */
+			continue;
+
+		if (em->em_is_child || em->em_is_const || bms_is_empty(em->em_relids) ||
+			bms_is_member(0, em->em_relids))
+		{
+			em->em_ndistinct = 0.;
+			continue;
+		}
+
+		if (em->em_ndistinct < 0.)
+		{
+			bool	isdefault = true;
+			double	ndist = 0.;
+
+			/* Let's check candidate's ndistinct value */
+			examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+			if (HeapTupleIsValid(vardata.statsTuple))
+				ndist = get_variable_numdistinct(&vardata, &isdefault);
+			ReleaseVariableStats(vardata);
+
+			if (isdefault)
+			{
+				em->em_ndistinct = 0.;
+				continue;
+			}
+
+			em->em_ndistinct = ndist;
+		}
+
+		Assert(em->em_ndistinct > 0.);
+
+		if (candidate->em_ndistinct == 0. ||
+			em->em_ndistinct < candidate->em_ndistinct)
+			candidate = em;
+	}
+
+	Assert(candidate != NULL);
+	return candidate;
+}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..4ef5256a7f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1448,6 +1448,8 @@ typedef struct EquivalenceMember
 	JoinDomain *em_jdomain;		/* join domain containing the source clause */
 	/* if em_is_child is true, this links to corresponding EM for top parent */
 	struct EquivalenceMember *em_parent pg_node_attr(read_write_ignore);
+
+	double em_ndistinct; /* cached value of ndistinct: 0- default value or 'unknown'; -1 - not defined yet */
 } EquivalenceMember;
 
 /*
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 2df7a5db12..409eb8a4b8 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1722,3 +1722,54 @@ order by t1.four, t1.two limit 1;
                ->  Seq Scan on tenk1 t2
 (12 rows)
 
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+SET enable_hashjoin = 'off';
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 98b20e17e1..af4d145395 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -298,3 +298,34 @@ explain (costs off)
 select * from
   (select * from tenk1 order by four) t1 join tenk1 t2 on t1.four = t2.four and t1.two = t2.two
 order by t1.four, t1.two limit 1;
+
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+
+SET enable_hashjoin = 'off';
+
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
-- 
2.39.5

#15Andrei Lepikhov
lepihov@gmail.com
In reply to: Tomas Vondra (#5)
1 attachment(s)
Re: Incremental Sort Cost Estimation Instability

On 9/12/24 16:57, Tomas Vondra wrote:

On 9/12/24 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

I'm not sure how to fix this, but it seems estimate_num_groups() needs
to do things differently. And I agree looking for the minimum ndistinct
seems like the right approach. but doesn't estimate_num_groups()
supposed to already do that? The comment says:

Hi,

I have rewritten the code according to the idea of the
estimate_num_groups responsibility to adjust estimation according to the EC.
I haven't changed all the places of ngroups estimation - only where the
planner can access pathkeys to avoid the overhead of passing through all
the ECs. And added some cache in the EM.
The most important case for me here is GROUP-BY estimation and
create_unique_path because I frequently see them as sides of a JOIN.
It would be also nice to adjust the cost of memoize rescan, but it may
be a subject of the future improvement.

--
regards, Andrei Lepikhov

Attachments:

v0-0001-Employ-EquivalenceClass-to-adjust-ndistinct-estim.patchtext/x-patch; charset=UTF-8; name=v0-0001-Employ-EquivalenceClass-to-adjust-ndistinct-estim.patchDownload
From 18a320cf9eee52299909f50533ef49cf2af2c69e Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Wed, 14 May 2025 12:36:10 +0200
Subject: [PATCH v0] Employ EquivalenceClass to adjust ndistinct estimation.

Operations like grouping, incremental sort, hash join, memoize, etc., estimate
the number of groups in a source using the statistics of this column
or expression.
Equivalence clauses like 'x=y' obviously reduce the maximum number of distinct
values above the clause evaluation node to the fewest distincts in the 'x'
or 'y' source. Therefore, in estimating the groups number, the planner may
involve this data from the EC.
Identification of proper EC for arbitrary expression seems too expensive
operation. However, having a pathkey gives the planner immediate access to
the EC. Here, a routine to identify proper expression is provided. ndistinct
estimation is cached inside the EM.
---
 contrib/postgres_fdw/postgres_fdw.c           |   2 +-
 src/backend/optimizer/path/costsize.c         | 130 ++++++++++--------
 src/backend/optimizer/path/equivclass.c       |   1 +
 src/backend/optimizer/path/indxpath.c         |   1 +
 src/backend/optimizer/plan/createplan.c       |   1 +
 src/backend/optimizer/plan/planner.c          |  29 ++--
 src/backend/optimizer/prep/prepunion.c        |   1 +
 src/backend/optimizer/util/pathnode.c         |   2 +
 src/backend/utils/adt/selfuncs.c              |  41 +++++-
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/optimizer/cost.h                  |   5 +-
 src/include/utils/selfuncs.h                  |   2 +-
 .../regress/expected/incremental_sort.out     |  50 +++++++
 src/test/regress/expected/stats.out           |  17 +++
 src/test/regress/sql/incremental_sort.sql     |  29 ++++
 src/test/regress/sql/stats.sql                |   8 ++
 16 files changed, 249 insertions(+), 75 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 331f3fc088d..09766daf97c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3370,7 +3370,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->processed_groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL, NULL);
+											input_rows, NULL, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d44815ed5a..5b51d020b7f 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1999,7 +1999,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 void
 cost_incremental_sort(Path *path,
 					  PlannerInfo *root, List *pathkeys, int presorted_keys,
-					  int input_disabled_nodes,
+					  int input_disabled_nodes, Relids relids,
 					  Cost input_startup_cost, Cost input_total_cost,
 					  double input_tuples, int width, Cost comparison_cost, int sort_mem,
 					  double limit_tuples)
@@ -2012,9 +2012,6 @@ cost_incremental_sort(Path *path,
 	Cost		group_startup_cost,
 				group_run_cost,
 				group_input_run_cost;
-	List	   *presortedExprs = NIL;
-	ListCell   *l;
-	bool		unknown_varno = false;
 
 	Assert(presorted_keys > 0 && presorted_keys < list_length(pathkeys));
 
@@ -2025,58 +2022,10 @@ cost_incremental_sort(Path *path,
 	if (input_tuples < 2.0)
 		input_tuples = 2.0;
 
-	/* Default estimate of number of groups, capped to one group per row. */
-	input_groups = Min(input_tuples, DEFAULT_NUM_DISTINCT);
-
-	/*
-	 * Extract presorted keys as list of expressions.
-	 *
-	 * We need to be careful about Vars containing "varno 0" which might have
-	 * been introduced by generate_append_tlist, which would confuse
-	 * estimate_num_groups (in fact it'd fail for such expressions). See
-	 * recurse_set_operations which has to deal with the same issue.
-	 *
-	 * Unlike recurse_set_operations we can't access the original target list
-	 * here, and even if we could it's not very clear how useful would that be
-	 * for a set operation combining multiple tables. So we simply detect if
-	 * there are any expressions with "varno 0" and use the default
-	 * DEFAULT_NUM_DISTINCT in that case.
-	 *
-	 * We might also use either 1.0 (a single group) or input_tuples (each row
-	 * being a separate group), pretty much the worst and best case for
-	 * incremental sort. But those are extreme cases and using something in
-	 * between seems reasonable. Furthermore, generate_append_tlist is used
-	 * for set operations, which are likely to produce mostly unique output
-	 * anyway - from that standpoint the DEFAULT_NUM_DISTINCT is defensive
-	 * while maintaining lower startup cost.
-	 */
-	foreach(l, pathkeys)
-	{
-		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
-
-		/*
-		 * Check if the expression contains Var with "varno 0" so that we
-		 * don't call estimate_num_groups in that case.
-		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
-		{
-			unknown_varno = true;
-			break;
-		}
-
-		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
-
-		if (foreach_current_index(l) + 1 >= presorted_keys)
-			break;
-	}
-
 	/* Estimate the number of groups with equal presorted keys. */
-	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
-										   NULL, NULL);
+	input_groups = estimate_num_groups(root,
+									   list_copy_head(pathkeys, presorted_keys),
+									   input_tuples, NULL, NULL, relids);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
@@ -2579,7 +2528,7 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
 
 	/* estimate on the distinct number of parameter values */
 	ndistinct = estimate_num_groups(root, mpath->param_exprs, calls, NULL,
-									&estinfo);
+									&estinfo, NULL);
 
 	/*
 	 * When the estimation fell back on using a default value, it's a bit too
@@ -2900,7 +2849,7 @@ get_windowclause_startup_tuples(PlannerInfo *root, WindowClause *wc,
 														root->parse->targetList);
 
 		num_partitions = estimate_num_groups(root, partexprs, input_tuples,
-											 NULL, NULL);
+											 NULL, NULL, NULL);
 		list_free(partexprs);
 
 		partition_tuples = input_tuples / num_partitions;
@@ -2923,7 +2872,7 @@ get_windowclause_startup_tuples(PlannerInfo *root, WindowClause *wc,
 		/* estimate out how many peer groups there are in the partition */
 		num_groups = estimate_num_groups(root, orderexprs,
 										 partition_tuples, NULL,
-										 NULL);
+										 NULL, NULL);
 		list_free(orderexprs);
 		peer_tuples = partition_tuples / num_groups;
 	}
@@ -3703,6 +3652,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 								  outersortkeys,
 								  outer_presorted_keys,
 								  outer_path->disabled_nodes,
+								  outer_path->parent->relids,
 								  outer_path->startup_cost,
 								  outer_path->total_cost,
 								  outer_path_rows,
@@ -6614,3 +6564,67 @@ compute_gather_rows(Path *path)
 
 	return clamp_row_est(path->rows * get_parallel_divisor(path));
 }
+
+/*
+ * Find suitable member of the equivalence class.
+ * Passing through the list of EC members find the member with minimum of
+ * distinct values. Cache estimated number of distincts in the em_ndistinct
+ * field of each member.
+ *
+ * Return NULL if no one proper member found (each member contains 0 relid).
+ */
+EquivalenceMember *
+identify_proper_ecmember(PlannerInfo *root, EquivalenceClass *ec, Relids relids)
+{
+	EquivalenceMember		   *candidate = NULL;
+	EquivalenceMember		   *em;
+	EquivalenceMemberIterator	it;
+
+	setup_eclass_member_iterator(&it, ec, relids);
+	while ((em = eclass_member_iterator_next(&it)) != NULL)
+	{
+		VariableStatData	vardata;
+
+		if (bms_is_member(0, em->em_relids))
+			continue;
+
+		if (em->em_is_const || bms_is_empty(em->em_relids))
+		{
+			/* Trivial case. Set up cache values and go further */
+			em->em_default_nd = false;
+			em->em_ndistinct = 1.0;
+		}
+		else if (relids && !bms_is_subset(em->em_relids, relids))
+			continue;
+
+		if (em->em_ndistinct < 0.)
+		{
+			/* Let's check candidate's ndistinct value */
+			examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+			if (HeapTupleIsValid(vardata.statsTuple))
+				em->em_ndistinct =
+						get_variable_numdistinct(&vardata, &em->em_default_nd);
+			else
+			{
+				em->em_ndistinct = 0.0;
+				em->em_default_nd = true;
+			}
+			ReleaseVariableStats(vardata);
+		}
+
+		if (candidate == NULL)
+			candidate = em;
+
+		if (em->em_default_nd)
+			/* Nothing helpful */
+			continue;
+
+		Assert(em->em_ndistinct > 0.);
+
+		if (candidate->em_ndistinct == 0. ||
+			em->em_ndistinct < candidate->em_ndistinct)
+			candidate = em;
+	}
+
+	return candidate;
+}
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 441f12f6c50..422241aa7f1 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -601,6 +601,7 @@ make_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
 	em->em_datatype = datatype;
 	em->em_jdomain = jdomain;
 	em->em_parent = parent;
+	em->em_ndistinct = -1.0;
 
 	if (bms_is_empty(relids))
 	{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 601354ea3e0..d3354506813 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2400,6 +2400,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
 										  NULL,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4ad30b7627e..cb728e1e12d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5523,6 +5523,7 @@ label_incrementalsort_with_costsize(PlannerInfo *root, IncrementalSort *plan,
 	cost_incremental_sort(&sort_path, root, pathkeys,
 						  plan->nPresortedCols,
 						  plan->sort.plan.disabled_nodes,
+						  NULL,
 						  lefttree->startup_cost,
 						  lefttree->total_cost,
 						  lefttree->plan_rows,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index beafac8c0b0..c6881362637 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -144,6 +144,7 @@ static List *extract_rollup_sets(List *groupingSets);
 static List *reorder_grouping_sets(List *groupingSets, List *sortclause);
 static void standard_qp_callback(PlannerInfo *root, void *extra);
 static double get_number_of_groups(PlannerInfo *root,
+								   Relids relids,
 								   double path_rows,
 								   grouping_sets_data *gd,
 								   List *target_list);
@@ -3599,7 +3600,7 @@ standard_qp_callback(PlannerInfo *root, void *extra)
  * determining whether some combination of them could be hashed instead.
  */
 static double
-get_number_of_groups(PlannerInfo *root,
+get_number_of_groups(PlannerInfo *root, Relids relids,
 					 double path_rows,
 					 grouping_sets_data *gd,
 					 List *target_list)
@@ -3639,7 +3640,8 @@ get_number_of_groups(PlannerInfo *root,
 																groupExprs,
 																path_rows,
 																&gset,
-																NULL);
+																NULL,
+																relids);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3665,7 +3667,8 @@ get_number_of_groups(PlannerInfo *root,
 																groupExprs,
 																path_rows,
 																&gset,
-																NULL);
+																NULL,
+																relids);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3676,12 +3679,12 @@ get_number_of_groups(PlannerInfo *root,
 		}
 		else
 		{
-			/* Plain GROUP BY -- estimate based on optimized groupClause */
-			groupExprs = get_sortgrouplist_exprs(root->processed_groupClause,
-												 target_list);
+			List *pathkeys = list_copy_head(root->group_pathkeys,
+											root->num_groupby_pathkeys);
 
-			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL, NULL);
+			/* Plain GROUP BY -- estimate based on grouping pathkeys */
+			dNumGroups = estimate_num_groups(root, pathkeys, path_rows,
+											 NULL, NULL, relids);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4071,7 +4074,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/*
 	 * Estimate number of groups.
 	 */
-	dNumGroups = get_number_of_groups(root,
+	dNumGroups = get_number_of_groups(root, cheapest_path->parent->relids,
 									  cheapest_path->rows,
 									  gd,
 									  extra->targetList);
@@ -4843,7 +4846,7 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* estimate how many distinct rows we'll get from each worker */
 	numDistinctRows = estimate_num_groups(root, distinctExprs,
 										  cheapest_partial_path->rows,
-										  NULL, NULL);
+										  NULL, NULL, NULL);
 
 	/*
 	 * Try sorting the cheapest path and incrementally sorting any paths with
@@ -5014,7 +5017,7 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL, NULL);
+											  NULL, NULL, NULL);
 	}
 
 	/*
@@ -7355,13 +7358,13 @@ create_partial_grouping_paths(PlannerInfo *root,
 	/* Estimate number of partial groups. */
 	if (cheapest_total_path != NULL)
 		dNumPartialGroups =
-			get_number_of_groups(root,
+			get_number_of_groups(root, input_rel->relids,
 								 cheapest_total_path->rows,
 								 gd,
 								 extra->targetList);
 	if (cheapest_partial_path != NULL)
 		dNumPartialPartialGroups =
-			get_number_of_groups(root,
+			get_number_of_groups(root, input_rel->relids,
 								 cheapest_partial_path->rows,
 								 gd,
 								 extra->targetList);
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..e094ad103d6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -665,6 +665,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
 											  get_tlist_exprs(subroot->parse->targetList, false),
 											  rel->cheapest_total_path->rows,
 											  NULL,
+											  NULL,
 											  NULL);
 	}
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..9652094b47b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1858,6 +1858,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
 											  NULL,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
@@ -3059,6 +3060,7 @@ create_incremental_sort_path(PlannerInfo *root,
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
 						  subpath->disabled_nodes,
+						  subpath->parent->relids,
 						  subpath->startup_cost,
 						  subpath->total_cost,
 						  subpath->rows,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a96b1b9c0bc..1ab5863338f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3444,7 +3444,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset, EstimationInfo *estinfo)
+					List **pgset, EstimationInfo *estinfo, Relids relids)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3481,6 +3481,27 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	 */
 	numdistinct = 1.0;
 
+	/*
+	 * We need to be careful about Vars containing "varno 0" which might have
+	 * been introduced by generate_append_tlist, which would confuse
+	 * estimate_num_groups (in fact it'd fail for such expressions). See
+	 * recurse_set_operations which has to deal with the same issue.
+	 *
+	 * Unlike recurse_set_operations we can't access the original target list
+	 * here, and even if we could it's not very clear how useful would that be
+	 * for a set operation combining multiple tables. So we simply detect if
+	 * there are any expressions with "varno 0" and use the default
+	 * DEFAULT_NUM_DISTINCT in that case.
+	 *
+	 * We might also use either 1.0 (a single group) or input_tuples (each row
+	 * being a separate group), pretty much the worst and best case for
+	 * incremental sort. But those are extreme cases and using something in
+	 * between seems reasonable. Furthermore, generate_append_tlist is used
+	 * for set operations, which are likely to produce mostly unique output
+	 * anyway - from that standpoint the DEFAULT_NUM_DISTINCT is defensive
+	 * while maintaining lower startup cost.
+	 */
+
 	i = 0;
 	foreach(l, groupExprs)
 	{
@@ -3494,6 +3515,24 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 		if (pgset && !list_member_int(*pgset, i++))
 			continue;
 
+		/*
+		 * PathKey doesn't provide an expression explicitly. With a sake of
+		 * estimation we may choose any one from the equvalence class which
+		 * provides least number of distinct values and evaluates under the
+		 * relids provided.
+		 */
+		if (IsA(groupexpr, PathKey))
+		{
+			PathKey			   *pathkey = (PathKey *) groupexpr;
+			EquivalenceMember  *em;
+
+			em = identify_proper_ecmember(root, pathkey->pk_eclass, relids);
+			if (em == NULL)
+				/*The case of varno 0 */
+				return Min(input_rows, DEFAULT_NUM_DISTINCT);
+			groupexpr = (Node *) em->em_expr;
+		}
+
 		/*
 		 * Set-returning functions in grouping columns are a bit problematic.
 		 * The code below will effectively ignore their SRF nature and come up
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1dd2d1560cb..203445eda7e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1501,6 +1501,8 @@ typedef struct EquivalenceClass
  * different when dealing with a binary-compatible opfamily; in particular
  * anyarray_ops would never work without this.  Use em_datatype when
  * looking up a specific btree operator to work with this expression.
+ * em_ndistinct caches statistical estimation of ndistinct value for specific
+ * expression. special values: 0 - no statistic available; -1 - not initialised.
  */
 typedef struct EquivalenceMember
 {
@@ -1516,6 +1518,9 @@ typedef struct EquivalenceMember
 	JoinDomain *em_jdomain;		/* join domain containing the source clause */
 	/* if em_is_child is true, this links to corresponding EM for top parent */
 	struct EquivalenceMember *em_parent pg_node_attr(read_write_ignore);
+
+	double		em_ndistinct;		/* cached ndistinct value */
+	bool		em_default_nd;
 } EquivalenceMember;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d397fe27dc1..984104433fd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -114,7 +114,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 					  double limit_tuples);
 extern void cost_incremental_sort(Path *path,
 								  PlannerInfo *root, List *pathkeys, int presorted_keys,
-								  int input_disabled_nodes,
+								  int input_disabled_nodes, Relids relids,
 								  Cost input_startup_cost, Cost input_total_cost,
 								  double input_tuples, int width, Cost comparison_cost, int sort_mem,
 								  double limit_tuples);
@@ -222,5 +222,8 @@ extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
 								   Path *bitmapqual, double loop_count,
 								   Cost *cost_p, double *tuples_p);
 extern double compute_gather_rows(Path *path);
+extern EquivalenceMember *identify_proper_ecmember(PlannerInfo *root,
+												 EquivalenceClass *ec,
+												 Relids relids);
 
 #endif							/* COST_H */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 013049b3098..665d659497f 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -216,7 +216,7 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
 								  double input_rows, List **pgset,
-								  EstimationInfo *estinfo);
+								  EstimationInfo *estinfo, Relids relids);
 
 extern List *estimate_multivariate_bucketsize(PlannerInfo *root,
 											  RelOptInfo *inner,
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index b00219643b9..b23dc13ca6f 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1722,3 +1722,53 @@ order by t1.four, t1.two limit 1;
                ->  Seq Scan on tenk1 t2
 (12 rows)
 
+--
+-- Commuting of sides in an expression doesn't influence cost estimation
+--
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%1000 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+SET enable_hashjoin = 'off';
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+RESET enable_hashjoin;
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 776f1ad0e53..a04b580c643 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1868,4 +1868,21 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
 (1 row)
 
 DROP TABLE table_fillfactor;
+-- The estimation of the number of groups is based on the equivalence class
+-- definition and shouldn't change if an alternative member of the same EC
+-- is chosen.
+SELECT * FROM check_estimated_rows('
+  SELECT count(*) FROM sort_ndist_t1 t1 WHERE t1.x=t1.y GROUP BY t1.x');
+ estimated | actual 
+-----------+--------
+        10 |     10
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+  SELECT count(*) FROM sort_ndist_t1 t1 WHERE t1.x=t1.y GROUP BY t1.y');
+ estimated | actual 
+-----------+--------
+        10 |     10
+(1 row)
+
 -- End of Stats Test
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index f1f8fae5654..298a2782b95 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -298,3 +298,32 @@ explain (costs off)
 select * from
   (select * from tenk1 order by four) t1 join tenk1 t2 on t1.four = t2.four and t1.two = t2.two
 order by t1.four, t1.two limit 1;
+
+--
+-- Commuting of sides in an expression doesn't influence cost estimation
+--
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%1000 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+
+SET enable_hashjoin = 'off';
+
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 232ab8db8fa..b4d3d6d0862 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -925,4 +925,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
 
 DROP TABLE table_fillfactor;
 
+-- The estimation of the number of groups is based on the equivalence class
+-- definition and shouldn't change if an alternative member of the same EC
+-- is chosen.
+SELECT * FROM check_estimated_rows('
+  SELECT count(*) FROM sort_ndist_t1 t1 WHERE t1.x=t1.y GROUP BY t1.x');
+SELECT * FROM check_estimated_rows('
+  SELECT count(*) FROM sort_ndist_t1 t1 WHERE t1.x=t1.y GROUP BY t1.y');
+
 -- End of Stats Test
-- 
2.39.5

#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#15)
Re: Incremental Sort Cost Estimation Instability

Hi, Andrei!

On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 9/12/24 16:57, Tomas Vondra wrote:

On 9/12/24 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

I'm not sure how to fix this, but it seems estimate_num_groups() needs
to do things differently. And I agree looking for the minimum ndistinct
seems like the right approach. but doesn't estimate_num_groups()
supposed to already do that? The comment says:

Hi,

I have rewritten the code according to the idea of the
estimate_num_groups responsibility to adjust estimation according to the EC.
I haven't changed all the places of ngroups estimation - only where the
planner can access pathkeys to avoid the overhead of passing through all
the ECs. And added some cache in the EM.
The most important case for me here is GROUP-BY estimation and
create_unique_path because I frequently see them as sides of a JOIN.
It would be also nice to adjust the cost of memoize rescan, but it may
be a subject of the future improvement.

Thank you for your work on this subject. I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's. I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0". Should we do this in cost_incremental_sort() as before? I see
that it currently analyzes only first EquivalenceMember. But could it
be a different answers between first EquivalenceMember and others?

------
Regards,
Alexander Korotkov
Supabase

#17Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#16)
Re: Incremental Sort Cost Estimation Instability

On Tue, Jun 3, 2025 at 5:02 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 9/12/24 16:57, Tomas Vondra wrote:

On 9/12/24 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

I'm not sure how to fix this, but it seems estimate_num_groups() needs
to do things differently. And I agree looking for the minimum ndistinct
seems like the right approach. but doesn't estimate_num_groups()
supposed to already do that? The comment says:

Hi,

I have rewritten the code according to the idea of the
estimate_num_groups responsibility to adjust estimation according to the EC.
I haven't changed all the places of ngroups estimation - only where the
planner can access pathkeys to avoid the overhead of passing through all
the ECs. And added some cache in the EM.
The most important case for me here is GROUP-BY estimation and
create_unique_path because I frequently see them as sides of a JOIN.
It would be also nice to adjust the cost of memoize rescan, but it may
be a subject of the future improvement.

Thank you for your work on this subject. I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's. I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0". Should we do this in cost_incremental_sort() as before? I see
that it currently analyzes only first EquivalenceMember. But could it
be a different answers between first EquivalenceMember and others?

4. A new EquivalenceMember.em_default_nd field is not documented. Is
it necessary to keep a dedicated field for this? Could we use just
special value of em_ndistinct (which would need to be documented as
well)?
5. I think the algorithm of distinct elements estimation needs more
clarification. Assuming all equivalence members within the relids
should be equal, the cardinality might be lower than cardinality of
any member. Do we assume the lowest cardinality among the individual
matching members as the conservative worst-case scenario?

------
Regards,
Alexander Korotkov
Supabase