Redundant Unique plan node for table with a unique index

Started by Damir Belyalovover 2 years ago5 messages
#1Damir Belyalov
dam.bel07@gmail.com
1 attachment(s)

Hello!

There is a table with a unique index on it and we have a query that
searching DISTINCT values on this table on columns of unique index. Example:

create table a (n int);
insert into a (n) select x from generate_series(1, 140000) as g(x);
create unique index on a (n);
explain select distinct n from a;
QUERY PLAN

------------------------------------------------------------------------------------
Unique (cost=0.42..6478.42 rows=140000 width=4)
-> Index Only Scan using a_n_idx on a (cost=0.42..6128.42 rows=140000
width=4)
(2 rows)

We can see that Unique node is redundant for this case. So I implemented a
simple patch that removes Unique node from the plan.
After patch:

explain select distinct n from a;
QUERY PLAN
---------------------------------------------------------
Seq Scan on a (cost=0.00..2020.00 rows=140000 width=4)
(1 row)

The patch is rather simple and doesn't consider queries with joins. The
criteria when Unique node is should be removed is a case when a set of Vars
in DISTINCT clause contains unique index columns from the same table.
Another example:
CREATE TABLE a (n int, m int);
CRETE UNIQUE INDEX ON a (n);
SELECT DISTINCT (n,m) FROM a;
The Unique node should be deleted because n is contained in (n,m).

The patch doesn't consider these cases:
1. DISTINCT ON [EXPR]
Because this case can need grouping.
2. Subqueries.
Because this case can need grouping:
CREATE TABLE a (n int);
CREA UNIQUE INDEX ON a (n);
SELECT DISTINCT g FROM (SELECT * FROM a) as g;
3. Joins, because it demands complication of code.
Example:
SELECT DISTINCT a.n1 JOIN b where a.n1 = b.n1;
where a.n1 and b.n1 should be unique indexes and join qual should be
on this index columns.
or
a have a unique index on n1 and b is "unique for a" on join qual.

I am wondering if there are opportunities for further development of this
patch, in particular for JOIN cases.
For several levels of JOINs we should understand which set columns is
unique for the every joinrel in query. In general terms I identified two
cases when joinrel "saves" unique index from table: when tables are joined
by unique index columns and when one table has unique index and it is
"unique_for" (has one common tuple) another table.

Regards,
Damir Belyalov
Postgres Professional

Attachments:

0001-Redundant-Unique-Node.patchtext/x-patch; charset=US-ASCII; name=0001-Redundant-Unique-Node.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4ebc..8f9912f48ad 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1266,6 +1266,47 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
 	return (Expr *) preprocess_expression(root, (Node *) expr, EXPRKIND_PHV);
 }
 
+static bool
+is_distinct_redundant(Query *parse, RelOptInfo *current_rel, PathTarget *sort_input_target)
+{
+	List	 *distinct_vars = NIL;
+	ListCell *lc1, *lc2;
+
+	/* Doesn't process DISTINCT ON because it can need grouping */
+	if (parse->hasDistinctOn)
+		return false;
+
+	/* Fetch Vars from DISTINCT clause */
+	foreach(lc1, sort_input_target->exprs)
+	{
+		Var *distinct_expr = (Var *) lfirst(lc1);
+
+		if (IsA(distinct_expr, Var))
+			distinct_vars = list_append_unique_int(distinct_vars, distinct_expr->varattno);
+		else
+			/* Doesn't process this case because it can need grouping */
+			return false;
+	}
+
+	foreach(lc2, current_rel->indexlist)
+	{
+		IndexOptInfo *indexinfo = (IndexOptInfo *) lfirst(lc2);
+		List		 *unique_indexkeys = NIL;
+		int			  i;
+
+		if (indexinfo->unique)
+		{
+			for (i = 0; i < indexinfo->ncolumns; i++)
+				unique_indexkeys = list_append_unique_int(unique_indexkeys, indexinfo->indexkeys[i]);
+
+			if (list_difference_int(unique_indexkeys, distinct_vars) == NIL)
+				return true;
+		}
+	}
+
+	return false;
+}
+
 /*--------------------
  * grouping_planner
  *	  Perform planning steps related to grouping, aggregation, etc.
@@ -1694,8 +1735,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		 */
 		if (parse->distinctClause)
 		{
-			current_rel = create_distinct_paths(root,
-												current_rel);
+			if (!is_distinct_redundant(parse, current_rel, sort_input_target))
+				current_rel = create_distinct_paths(root,
+													current_rel);
 		}
 	}							/* end of if (setOperations) */
 
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b8638f286a..49223c5be10 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5645,18 +5645,15 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+             QUERY PLAN             
+------------------------------------
  Merge Right Join
    Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   ->  Index Scan using b_pkey on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(6 rows)
 
 -- join removal is not possible here
 explain (costs off)
#2Daniel Gustafsson
daniel@yesql.se
In reply to: Damir Belyalov (#1)
Re: Redundant Unique plan node for table with a unique index

On 13 Sep 2023, at 15:22, Damir Belyalov <dam.bel07@gmail.com> wrote:

There is a table with a unique index on it and we have a query that searching DISTINCT values on this table on columns of unique index.

We can see that Unique node is redundant for this case. So I implemented a simple patch that removes Unique node from the plan.

Is this query pattern common enough to warrant spending time on in the planner
(are there perhaps ORMs that generate such)? Have you measured the overhead of
this?

--
Daniel Gustafsson

#3David Rowley
dgrowleyml@gmail.com
In reply to: Damir Belyalov (#1)
Re: Redundant Unique plan node for table with a unique index

On Thu, 14 Sept 2023 at 02:28, Damir Belyalov <dam.bel07@gmail.com> wrote:

create table a (n int);
insert into a (n) select x from generate_series(1, 140000) as g(x);
create unique index on a (n);
explain select distinct n from a;
QUERY PLAN
------------------------------------------------------------------------------------
Unique (cost=0.42..6478.42 rows=140000 width=4)
-> Index Only Scan using a_n_idx on a (cost=0.42..6128.42 rows=140000 width=4)
(2 rows)

We can see that Unique node is redundant for this case. So I implemented a simple patch that removes Unique node from the plan.

I don't think this is a good way to do this. The method you're using
only supports this optimisation when querying a table directly. If
there were subqueries, joins, etc then it wouldn't work as there are
no unique indexes. You should probably have a look at [1]/messages/by-id/flat/CAKU4AWqZvSyxroHkbpiHSCEAY2C41dG7VWs=c188KKznSK_2Zg@mail.gmail.com to see
further details of an alternative method without the said limitations.

David

[1]: /messages/by-id/flat/CAKU4AWqZvSyxroHkbpiHSCEAY2C41dG7VWs=c188KKznSK_2Zg@mail.gmail.com

#4Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#3)
Re: Redundant Unique plan node for table with a unique index

I don't think this is a good way to do this. The method you're using
only supports this optimisation when querying a table directly. If
there were subqueries, joins, etc then it wouldn't work as there are
no unique indexes. You should probably have a look at [1] to see
further details of an alternative method without the said limitations.

David

[1]
/messages/by-id/flat/CAKU4AWqZvSyxroHkbpiHSCEAY2C41dG7VWs=c188KKznSK_2Zg@mail.gmail.com

The nullable tracking blocker probably has been removed by varnullingrels
so I will start working on UniqueKey stuff very soon, thank you David
for remember of this feature!

--
Best Regards
Andy Fan

#5Damir Belyalov
dam.bel07@gmail.com
In reply to: Andy Fan (#4)
Re: Redundant Unique plan node for table with a unique index

Thank you for feedback and thread [1].

Regards,
Damir Belyalov
Postgres Professional