Optimization outcome depends on the index order

Started by Andrei Lepikhovabout 2 years ago5 messages

a.lepikhov@postgrespro.ru

about 2 years ago

1 attachment(s)

On 21/12/2023 12:10, Alexander Korotkov wrote:

I took a closer look at the patch in [9]. I should drop my argument
about breaking the model, because add_path() already considers other
aspects than just costs. But I have two more note about that patch:

1) It seems that you're determining the fact that the index path
should return strictly one row by checking path->rows <= 1.0 and
indexinfo->unique. Is it really guaranteed that in this case quals
are matching unique constraint? path->rows <= 1.0 could be just an
estimation error. Or one row could be correctly estimated, but it's
going to be selected by some quals matching unique constraint and
other quals in recheck. So, it seems there is a risk to select
suboptimal index due to this condition.

Operating inside the optimizer, we consider all estimations to be the
sooth. This patch modifies only one place: having two equal assumptions,
we just choose one that generally looks more stable.
Filtered tuples should be calculated and included in the cost of the
path. The decision on the equality of paths has been made in view of the
estimation of these filtered tuples.

2) Even for non-unique indexes this patch is putting new logic on top
of the subsequent code. How we can prove it's going to be a win?
That could lead, for instance, to dropping parallel-safe paths in
cases we didn't do so before.

Because we must trust all predictions made by the planner, we just
choose the most trustworthy path. According to the planner logic, it is
a path with a smaller selectivity. We can make mistakes anyway just
because of the nature of estimation.

Anyway, please start a separate thread if you're willing to put more
work into this.

Done

9. /messages/by-id/154f786a-06a0-4fb1-
b8a4-16c66149731b%40postgrespro.ru

--
regards,
Andrei Lepikhov
Postgres Professional

Attachments:

v3-0001-Choose-an-index-path-with-the-best-selectivity-estim.patchtext/plain; charset=UTF-8; name=v3-0001-Choose-an-index-path-with-the-best-selectivity-estim.patchDownload

From 7b044de1449a5fdc450cb629caafb4e15ded7a93 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 27 Nov 2023 11:23:48 +0700
Subject: [PATCH] Choose an index path with the best selectivity estimation.

In the case when optimizer predicts only one row prefer choosing UNIQUE indexes
In other cases, if optimizer treats indexes as equal, make a last attempt
selecting the index with less selectivity - this decision takes away dependency
on the order of indexes in an index list (good for reproduction of some issues)
and proposes one more objective argument to choose specific index.
---
 src/backend/optimizer/util/pathnode.c         | 42 +++++++++++++++++++
 .../expected/drop-index-concurrently-1.out    | 16 +++----
 src/test/regress/expected/functional_deps.out | 39 +++++++++++++++++
 src/test/regress/expected/join.out            | 40 +++++++++---------
 src/test/regress/sql/functional_deps.sql      | 32 ++++++++++++++
 5 files changed, 143 insertions(+), 26 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..4b5aedd579 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -454,6 +454,48 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		costcmp = compare_path_costs_fuzzily(new_path, old_path,
 											 STD_FUZZ_FACTOR);
 
+		/*
+		 * Apply some heuristics on index paths.
+		 */
+		if (IsA(new_path, IndexPath) && IsA(old_path, IndexPath))
+		{
+			IndexPath *inp = (IndexPath *) new_path;
+			IndexPath *iop = (IndexPath *) old_path;
+
+			if (new_path->rows <= 1.0 && old_path->rows <= 1.0)
+			{
+				/*
+				 * When both paths are predicted to produce only one tuple,
+				 * the optimiser should prefer choosing a unique index scan
+				 * in all cases.
+				 */
+				if (inp->indexinfo->unique && !iop->indexinfo->unique)
+					costcmp = COSTS_BETTER1;
+				else if (!inp->indexinfo->unique && iop->indexinfo->unique)
+					costcmp = COSTS_BETTER2;
+				else if (costcmp != COSTS_DIFFERENT)
+					/*
+					 * If the optimiser doesn't have an obviously stable choice
+					 * of unique index, increase the chance of avoiding mistakes
+					 * by choosing an index with smaller selectivity.
+					 * This option makes decision more conservative and looks
+					 * debatable.
+					 */
+					costcmp = (inp->indexselectivity < iop->indexselectivity) ?
+												COSTS_BETTER1 : COSTS_BETTER2;
+			}
+			else if (costcmp == COSTS_EQUAL)
+				/*
+				 * The optimizer can't differ the value of two index paths.
+				 * To avoid making a decision that is based on only an index
+				 * order in the list, use some rational strategy based on
+				 * selectivity: prefer touching fewer tuples on the disk to
+				 * filtering them after.
+				 */
+				costcmp = (inp->indexselectivity < iop->indexselectivity) ?
+												COSTS_BETTER1 : COSTS_BETTER2;
+		}
+
 		/*
 		 * If the two paths compare differently for startup and total cost,
 		 * then we want to keep both, and we can skip comparing pathkeys and
diff --git a/src/test/isolation/expected/drop-index-concurrently-1.out b/src/test/isolation/expected/drop-index-concurrently-1.out
index 1cb2250891..2392cdb033 100644
--- a/src/test/isolation/expected/drop-index-concurrently-1.out
+++ b/src/test/isolation/expected/drop-index-concurrently-1.out
@@ -12,13 +12,15 @@ step preps: PREPARE getrow_seqscan AS SELECT * FROM test_dc WHERE data = 34 ORDE
 step begin: BEGIN;
 step disableseq: SET enable_seqscan = false;
 step explaini: EXPLAIN (COSTS OFF) EXECUTE getrow_idxscan;
-QUERY PLAN                                    
-----------------------------------------------
-Sort                                          
-  Sort Key: id                                
-  ->  Index Scan using test_dc_data on test_dc
-        Index Cond: (data = 34)               
-(4 rows)
+QUERY PLAN                                   
+---------------------------------------------
+Sort                                         
+  Sort Key: id                               
+  ->  Bitmap Heap Scan on test_dc            
+        Recheck Cond: (data = 34)            
+        ->  Bitmap Index Scan on test_dc_data
+              Index Cond: (data = 34)        
+(6 rows)
 
 step enableseq: SET enable_seqscan = true;
 step explains: EXPLAIN (COSTS OFF) EXECUTE getrow_seqscan;
diff --git a/src/test/regress/expected/functional_deps.out b/src/test/regress/expected/functional_deps.out
index 32381b8ae7..7057254278 100644
--- a/src/test/regress/expected/functional_deps.out
+++ b/src/test/regress/expected/functional_deps.out
@@ -230,3 +230,42 @@ EXECUTE foo;
 ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
 EXECUTE foo;  -- fail
 ERROR:  column "articles.keywords" must appear in the GROUP BY clause or be used in an aggregate function
+/*
+ * Corner case of the PostgreSQL optimizer:
+ *
+ * ANDed clauses selectivity multiplication increases total selectivity error.
+ * If such non-true selectivity is so tiny that row estimation predicts the
+ * absolute minimum number of tuples (1), the optimizer can't choose between
+ * different indexes and picks a first from the index list (last created).
+ */
+CREATE TABLE t AS (  -- selectivity(c1)*selectivity(c2)*nrows <= 1
+    SELECT  gs AS c1,
+            gs AS c2,
+            (gs % 10) AS c3, -- not in the good index.
+            (gs % 100) AS c4 -- not in the bad index.
+    FROM generate_series(1,1000) AS gs
+);
+CREATE INDEX bad ON t (c1,c2,c3);
+CREATE INDEX good ON t (c1,c2,c4);
+ANALYZE t;
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using good on t
+   Index Cond: ((c1 = 1) AND (c2 = 1) AND (c4 = 1))
+   Filter: (c3 = 1)
+(3 rows)
+
+-- Hack: set the bad index to the first position in the index list.
+DROP INDEX bad;
+CREATE INDEX bad ON t (c1,c2,c3);
+ANALYZE t;
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using good on t
+   Index Cond: ((c1 = 1) AND (c2 = 1) AND (c4 = 1))
+   Filter: (c3 = 1)
+(3 rows)
+
+DROP TABLE t CASCADE;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2c73270143..32b33fabd3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -8629,14 +8629,15 @@ analyze j2;
 explain (costs off) select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
 where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
-               QUERY PLAN                
------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Merge Join
-   Merge Cond: (j1.id1 = j2.id1)
-   Join Filter: (j2.id2 = j1.id2)
-   ->  Index Scan using j1_id1_idx on j1
-   ->  Index Scan using j2_id1_idx on j2
-(5 rows)
+   Merge Cond: ((j1.id1 = j2.id1) AND (j1.id2 = j2.id2))
+   ->  Index Only Scan using j1_pkey on j1
+         Filter: ((id1 % 1000) = 1)
+   ->  Index Only Scan using j2_pkey on j2
+         Filter: ((id1 % 1000) = 1)
+(6 rows)
 
 select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
@@ -8651,15 +8652,16 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
 explain (costs off) select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
 where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1 and j2.id1 = any (array[1]);
-                     QUERY PLAN                     
-----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Merge Join
-   Merge Cond: (j1.id1 = j2.id1)
-   Join Filter: (j2.id2 = j1.id2)
-   ->  Index Scan using j1_id1_idx on j1
-   ->  Index Scan using j2_id1_idx on j2
+   Merge Cond: ((j1.id1 = j2.id1) AND (j1.id2 = j2.id2))
+   ->  Index Only Scan using j1_pkey on j1
+         Filter: ((id1 % 1000) = 1)
+   ->  Index Only Scan using j2_pkey on j2
          Index Cond: (id1 = ANY ('{1}'::integer[]))
-(6 rows)
+         Filter: ((id1 % 1000) = 1)
+(7 rows)
 
 select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
@@ -8674,12 +8676,12 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1 and j2.id1 = any (array[1]);
 explain (costs off) select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
 where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1 and j2.id1 >= any (array[1,5]);
-                      QUERY PLAN                       
--------------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Merge Join
-   Merge Cond: (j1.id1 = j2.id1)
-   Join Filter: (j2.id2 = j1.id2)
-   ->  Index Scan using j1_id1_idx on j1
+   Merge Cond: ((j1.id1 = j2.id1) AND (j1.id2 = j2.id2))
+   ->  Index Only Scan using j1_pkey on j1
+         Filter: ((id1 % 1000) = 1)
    ->  Index Only Scan using j2_pkey on j2
          Index Cond: (id1 >= ANY ('{1,5}'::integer[]))
          Filter: ((id1 % 1000) = 1)
diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
index 406490b995..1be009b1ff 100644
--- a/src/test/regress/sql/functional_deps.sql
+++ b/src/test/regress/sql/functional_deps.sql
@@ -208,3 +208,35 @@ EXECUTE foo;
 ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
 
 EXECUTE foo;  -- fail
+
+/*
+ * Corner case of the PostgreSQL optimizer:
+ *
+ * ANDed clauses selectivity multiplication increases total selectivity error.
+ * If such non-true selectivity is so tiny that row estimation predicts the
+ * absolute minimum number of tuples (1), the optimizer can't choose between
+ * different indexes and picks a first from the index list (last created).
+ */
+
+CREATE TABLE t AS (  -- selectivity(c1)*selectivity(c2)*nrows <= 1
+    SELECT  gs AS c1,
+            gs AS c2,
+            (gs % 10) AS c3, -- not in the good index.
+            (gs % 100) AS c4 -- not in the bad index.
+    FROM generate_series(1,1000) AS gs
+);
+
+CREATE INDEX bad ON t (c1,c2,c3);
+CREATE INDEX good ON t (c1,c2,c4);
+ANALYZE t;
+
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+
+-- Hack: set the bad index to the first position in the index list.
+DROP INDEX bad;
+CREATE INDEX bad ON t (c1,c2,c3);
+ANALYZE t;
+
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+
+DROP TABLE t CASCADE;
-- 
2.43.0

Alexander Korotkov

aekorotkov@gmail.com

about 2 years ago

In reply to: Andrei Lepikhov (#1)

Re: Optimization outcome depends on the index order

On Fri, Dec 22, 2023 at 8:53 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

On 21/12/2023 12:10, Alexander Korotkov wrote:

I took a closer look at the patch in [9]. I should drop my argument
about breaking the model, because add_path() already considers other
aspects than just costs. But I have two more note about that patch:

1) It seems that you're determining the fact that the index path
should return strictly one row by checking path->rows <= 1.0 and
indexinfo->unique. Is it really guaranteed that in this case quals
are matching unique constraint? path->rows <= 1.0 could be just an
estimation error. Or one row could be correctly estimated, but it's
going to be selected by some quals matching unique constraint and
other quals in recheck. So, it seems there is a risk to select
suboptimal index due to this condition.

Operating inside the optimizer, we consider all estimations to be the
sooth. This patch modifies only one place: having two equal assumptions,
we just choose one that generally looks more stable.
Filtered tuples should be calculated and included in the cost of the
path. The decision on the equality of paths has been made in view of the
estimation of these filtered tuples.

Even if estimates are accurate the conditions in the patch doesn't
guarantee there is actually a unique condition.

# create table t as select i/1000 a, i % 1000 b, i % 1000 c from
generate_series(1,1000000) i;
# create unique index t_unique_idx on t(a,b);
# create index t_another_idx on t(a,c);
# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
c | integer | | |
Indexes:
"t_another_idx" btree (a, c)
"t_unique_idx" UNIQUE, btree (a, b)
# set enable_bitmapscan = off; explain select * from t where a = 1 and c =
1;
SET
Time: 0.459 ms
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using t_unique_idx on t (cost=0.42..1635.16 rows=1 width=12)
Index Cond: (a = 1)
Filter: (c = 1)
(3 rows)

2) Even for non-unique indexes this patch is putting new logic on top
of the subsequent code. How we can prove it's going to be a win?
That could lead, for instance, to dropping parallel-safe paths in
cases we didn't do so before.

Because we must trust all predictions made by the planner, we just
choose the most trustworthy path. According to the planner logic, it is
a path with a smaller selectivity. We can make mistakes anyway just
because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not clear
why this should be on top of other logic later in add_path().

Anyway, please start a separate thread if you're willing to put more
work into this.

Done

Thanks.

------
Regards,
Alexander Korotkov

Andrei Lepikhov

a.lepikhov@postgrespro.ru

about 2 years ago

In reply to: Alexander Korotkov (#2)

1 attachment(s)

Re: Optimization outcome depends on the index order

On 22/12/2023 11:48, Alexander Korotkov wrote:

Because we must trust all predictions made by the planner, we just
choose the most trustworthy path. According to the planner logic, it is
a path with a smaller selectivity. We can make mistakes anyway just
because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not
clear why this should be on top of other logic later in add_path().

I got your point now, thanks for pointing it out. In the next version of
the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachments:

v4-0001-Choose-an-index-path-with-the-best-selectivity-estim.patchtext/plain; charset=UTF-8; name=v4-0001-Choose-an-index-path-with-the-best-selectivity-estim.patchDownload

From 45bda9784d28dc9cec90c5b33285023a49850800 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 27 Nov 2023 11:23:48 +0700
Subject: [PATCH] Choose an index path with the best selectivity estimation.

In the case when optimizer predicts only one row prefer choosing UNIQUE indexes
In other cases, if optimizer treats indexes as equal, make a last attempt
selecting the index with less selectivity - this decision takes away dependency
on the order of indexes in an index list (good for reproduction of some issues)
and proposes one more objective argument to choose specific index.
---
 src/backend/optimizer/util/pathnode.c         | 32 +++++++++++++++
 src/test/regress/expected/functional_deps.out | 39 +++++++++++++++++++
 src/test/regress/sql/functional_deps.sql      | 32 +++++++++++++++
 3 files changed, 103 insertions(+)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..984c974b57 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -454,6 +454,38 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		costcmp = compare_path_costs_fuzzily(new_path, old_path,
 											 STD_FUZZ_FACTOR);
 
+		/*
+		 * Apply some heuristics on index paths.
+		 */
+		if (costcmp == COSTS_EQUAL)
+		{
+			IndexPath *inp = (IndexPath *) new_path;
+			IndexPath *iop = (IndexPath *) old_path;
+
+			if  (IsA(new_path, IndexPath) && IsA(old_path, IndexPath))
+			{
+				/*
+				 * When both paths are predicted to produce only one tuple,
+				 * the optimiser should prefer choosing a unique index scan
+				 * in all cases.
+				 */
+				if (inp->indexinfo->unique && !iop->indexinfo->unique)
+					costcmp = COSTS_BETTER1;
+				else if (!inp->indexinfo->unique && iop->indexinfo->unique)
+					costcmp = COSTS_BETTER2;
+				else if (costcmp != COSTS_DIFFERENT)
+					/*
+					 * If the optimiser doesn't have an obviously stable choice
+					 * of unique index, increase the chance of avoiding mistakes
+					 * by choosing an index with smaller selectivity.
+					 * This option makes decision more conservative and looks
+					 * debatable.
+					 */
+					costcmp = (inp->indexselectivity < iop->indexselectivity) ?
+												COSTS_BETTER1 : COSTS_BETTER2;
+			}
+		}
+
 		/*
 		 * If the two paths compare differently for startup and total cost,
 		 * then we want to keep both, and we can skip comparing pathkeys and
diff --git a/src/test/regress/expected/functional_deps.out b/src/test/regress/expected/functional_deps.out
index 32381b8ae7..7057254278 100644
--- a/src/test/regress/expected/functional_deps.out
+++ b/src/test/regress/expected/functional_deps.out
@@ -230,3 +230,42 @@ EXECUTE foo;
 ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
 EXECUTE foo;  -- fail
 ERROR:  column "articles.keywords" must appear in the GROUP BY clause or be used in an aggregate function
+/*
+ * Corner case of the PostgreSQL optimizer:
+ *
+ * ANDed clauses selectivity multiplication increases total selectivity error.
+ * If such non-true selectivity is so tiny that row estimation predicts the
+ * absolute minimum number of tuples (1), the optimizer can't choose between
+ * different indexes and picks a first from the index list (last created).
+ */
+CREATE TABLE t AS (  -- selectivity(c1)*selectivity(c2)*nrows <= 1
+    SELECT  gs AS c1,
+            gs AS c2,
+            (gs % 10) AS c3, -- not in the good index.
+            (gs % 100) AS c4 -- not in the bad index.
+    FROM generate_series(1,1000) AS gs
+);
+CREATE INDEX bad ON t (c1,c2,c3);
+CREATE INDEX good ON t (c1,c2,c4);
+ANALYZE t;
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using good on t
+   Index Cond: ((c1 = 1) AND (c2 = 1) AND (c4 = 1))
+   Filter: (c3 = 1)
+(3 rows)
+
+-- Hack: set the bad index to the first position in the index list.
+DROP INDEX bad;
+CREATE INDEX bad ON t (c1,c2,c3);
+ANALYZE t;
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using good on t
+   Index Cond: ((c1 = 1) AND (c2 = 1) AND (c4 = 1))
+   Filter: (c3 = 1)
+(3 rows)
+
+DROP TABLE t CASCADE;
diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
index 406490b995..1be009b1ff 100644
--- a/src/test/regress/sql/functional_deps.sql
+++ b/src/test/regress/sql/functional_deps.sql
@@ -208,3 +208,35 @@ EXECUTE foo;
 ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
 
 EXECUTE foo;  -- fail
+
+/*
+ * Corner case of the PostgreSQL optimizer:
+ *
+ * ANDed clauses selectivity multiplication increases total selectivity error.
+ * If such non-true selectivity is so tiny that row estimation predicts the
+ * absolute minimum number of tuples (1), the optimizer can't choose between
+ * different indexes and picks a first from the index list (last created).
+ */
+
+CREATE TABLE t AS (  -- selectivity(c1)*selectivity(c2)*nrows <= 1
+    SELECT  gs AS c1,
+            gs AS c2,
+            (gs % 10) AS c3, -- not in the good index.
+            (gs % 100) AS c4 -- not in the bad index.
+    FROM generate_series(1,1000) AS gs
+);
+
+CREATE INDEX bad ON t (c1,c2,c3);
+CREATE INDEX good ON t (c1,c2,c4);
+ANALYZE t;
+
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+
+-- Hack: set the bad index to the first position in the index list.
+DROP INDEX bad;
+CREATE INDEX bad ON t (c1,c2,c3);
+ANALYZE t;
+
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE c1=1 AND c2=1 AND c3=1 AND c4=1;
+
+DROP TABLE t CASCADE;
-- 
2.43.0

Alexander Korotkov

aekorotkov@gmail.com

about 2 years ago

In reply to: Andrei Lepikhov (#3)

Re: Optimization outcome depends on the index order

On Fri, Dec 22, 2023 at 7:24 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 22/12/2023 11:48, Alexander Korotkov wrote:

Because we must trust all predictions made by the planner, we just
choose the most trustworthy path. According to the planner logic, it is
a path with a smaller selectivity. We can make mistakes anyway just
because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not
clear why this should be on top of other logic later in add_path().

I got your point now, thanks for pointing it out. In the next version of
the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.

It looks better now. But it's hard for me to judge these heuristics
in add_path(). Tom, what do you think about this?

------
Regards,
Alexander Korotkov

Andrei Lepikhov

a.lepikhov@postgrespro.ru

about 2 years ago

In reply to: Alexander Korotkov (#4)

Re: Optimization outcome depends on the index order

On 25/12/2023 18:36, Alexander Korotkov wrote:

On Fri, Dec 22, 2023 at 7:24 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 22/12/2023 11:48, Alexander Korotkov wrote:

Because we must trust all predictions made by the planner, we just
choose the most trustworthy path. According to the planner logic, it is
a path with a smaller selectivity. We can make mistakes anyway just
because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not
clear why this should be on top of other logic later in add_path().

I got your point now, thanks for pointing it out. In the next version of
the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.

It looks better now. But it's hard for me to judge these heuristics
in add_path(). Tom, what do you think about this?

Just food for thought:
Another approach I have considered was to initialize the relation index
list according to some consistent rule: place unique indexes at the head
of the list, arrange indexes according to the number of columns involved
and sort in some lexical order.
But IMO, the implemented approach looks better.

--
regards,
Andrei Lepikhov
Postgres Professional