Modify an incorrect regression test case in the group by key value elimination function

Started by songjinzhou11 months ago4 messages
#1songjinzhou
tsinghualucky912@foxmail.com
1 attachment(s)

Dear hackers: 

Two months ago, we enhanced the group by key value elimination function. This is a very useful function. When I was learning, I found a regression test case that was not suitable, as follows:

-- When there are multiple supporting unique indexes and the GROUP BY contains
-- columns to cover all of those, ensure we pick the index with the least
-- number of columns so that we can remove more columns from the GROUP BY.
explain (costs off) select a,b,c from t3 group by a,b,c;
      QUERY PLAN      
----------------------
 HashAggregate
   Group Key: c
   ->  Seq Scan on t3
(3 rows)

-- As above but try ordering the columns differently to ensure we get the
-- same result.
explain (costs off) select a,b,c from t3 group by c,a,b;
      QUERY PLAN      
----------------------
 HashAggregate
   Group Key: c
   ->  Seq Scan on t3
(3 rows)

Because the table structure of t3 is defined as follows(Its PK is deferrable):

postgres=# \d+ t3
                                          Table "pg_temp_1.t3"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 a      | integer |           | not null |         | plain   |             |              |
 b      | integer |           | not null |         | plain   |             |              |
 c      | integer |           | not null |         | plain   |             |              |
Indexes:
    "t3_pkey" PRIMARY KEY, btree (a, b) DEFERRABLE
    "t3_c_uidx" UNIQUE, btree (c)
Not-null constraints:
    "t3_a_not_null" NOT NULL "a"
    "t3_b_not_null" NOT NULL "b"
    "t3_c_not_null" NOT NULL "c"
Access method: heap

postgres=#

I think this test case does not fully reflect the original meaning. So I made a small change to this and look forward to your feedback. Thanks!

Attachments:

v1_0001-Modify-an-incorrect-regression-test-case-in-the-grou.patchapplication/octet-stream; charset=utf-8; name=v1_0001-Modify-an-incorrect-regression-test-case-in-the-grou.patchDownload
From 86963d05c59d6f24b9745efe6cbad6d2c4b15dc3 Mon Sep 17 00:00:00 2001
From: songjinzhou <tsinghualucky912@foxmail.com>
Date: Mon, 17 Feb 2025 23:07:06 -0800
Subject: [PATCH] Modify an incorrect regression test case in the group by key
 value elimination function

---
 src/test/regress/expected/aggregates.out | 13 +++++++------
 src/test/regress/sql/aggregates.sql      |  5 +++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f2fb66388c..ce826af3cd 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1472,22 +1472,23 @@ explain (costs off) select b,c from t3 group by b,c;
 -- When there are multiple supporting unique indexes and the GROUP BY contains
 -- columns to cover all of those, ensure we pick the index with the least
 -- number of columns so that we can remove more columns from the GROUP BY.
-explain (costs off) select a,b,c from t3 group by a,b,c;
+alter table t2 alter column z set not null, add constraint t2_z_uidx unique (z);
+explain (costs off) select x,y,z from t2 group by x,y,z;
       QUERY PLAN      
 ----------------------
  HashAggregate
-   Group Key: c
-   ->  Seq Scan on t3
+   Group Key: z
+   ->  Seq Scan on t2
 (3 rows)
 
 -- As above but try ordering the columns differently to ensure we get the
 -- same result.
-explain (costs off) select a,b,c from t3 group by c,a,b;
+explain (costs off) select x,y,z from t2 group by z,x,y;
       QUERY PLAN      
 ----------------------
  HashAggregate
-   Group Key: c
-   ->  Seq Scan on t3
+   Group Key: z
+   ->  Seq Scan on t2
 (3 rows)
 
 -- Ensure we don't use a partial index as proof of functional dependency
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 77168bcc74..784b75728a 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -520,11 +520,12 @@ explain (costs off) select b,c from t3 group by b,c;
 -- When there are multiple supporting unique indexes and the GROUP BY contains
 -- columns to cover all of those, ensure we pick the index with the least
 -- number of columns so that we can remove more columns from the GROUP BY.
-explain (costs off) select a,b,c from t3 group by a,b,c;
+alter table t2 alter column z set not null, add constraint t2_z_uidx unique (z);
+explain (costs off) select x,y,z from t2 group by x,y,z;
 
 -- As above but try ordering the columns differently to ensure we get the
 -- same result.
-explain (costs off) select a,b,c from t3 group by c,a,b;
+explain (costs off) select x,y,z from t2 group by z,x,y;
 
 -- Ensure we don't use a partial index as proof of functional dependency
 drop index t3_c_uidx;
-- 
2.43.0

#2Japin Li
japinli@hotmail.com
In reply to: songjinzhou (#1)
Re: Modify an incorrect regression test case in the group by key value elimination function

On Tue, 18 Feb 2025 at 15:40, "songjinzhou" <tsinghualucky912@foxmail.com> wrote:

Dear hackers:

Two months ago, we enhanced the group by key value elimination function. This is a very useful function. When I was
learning, I found a regression test case that was not suitable, as follows:

-- When there are multiple supporting unique indexes and the GROUP BY contains
-- columns to cover all of those, ensure we pick the index with the least
-- number of columns so that we can remove more columns from the GROUP BY.
explain (costs off) select a,b,c from t3 group by a,b,c;
QUERY PLAN
----------------------
HashAggregate
Group Key: c
-> Seq Scan on t3
(3 rows)

-- As above but try ordering the columns differently to ensure we get the
-- same result.
explain (costs off) select a,b,c from t3 group by c,a,b;
QUERY PLAN
----------------------
HashAggregate
Group Key: c
-> Seq Scan on t3
(3 rows)

Because the table structure of t3 is defined as follows(Its PK is deferrable):

postgres=# \d+ t3
Table "pg_temp_1.t3"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
a | integer | | not null | | plain | | |
b | integer | | not null | | plain | | |
c | integer | | not null | | plain | | |
Indexes:
"t3_pkey" PRIMARY KEY, btree (a, b) DEFERRABLE
"t3_c_uidx" UNIQUE, btree (c)
Not-null constraints:
"t3_a_not_null" NOT NULL "a"
"t3_b_not_null" NOT NULL "b"
"t3_c_not_null" NOT NULL "c"
Access method: heap

postgres=#

I think this test case does not fully reflect the original meaning. So I made a small change to this and look forward to
your feedback. Thanks!

Yeah, the primary key of t3 is deferred, so only the t3_c_uidx can be used.
The test case is inconsistent with comments.

It seems that the t2 does not have a unique index on z, do you forget to
create it in the patch?

--
Regrads,
Japin Li

#3Zhang Mingli
zmlpostgres@gmail.com
In reply to: Japin Li (#2)
1 attachment(s)
Re: Modify an incorrect regression test case in the group by key value elimination function

On Feb 18, 2025 at 16:17 +0800, Japin Li <japinli@hotmail.com>, wrote:

On Tue, 18 Feb 2025 at 15:40, "songjinzhou" <tsinghualucky912@foxmail.com> wrote:

Dear hackers:

Two months ago, we enhanced the group by key value elimination function. This is a very useful function. When I was
learning, I found a regression test case that was not suitable, as follows:

-- When there are multiple supporting unique indexes and the GROUP BY contains
-- columns to cover all of those, ensure we pick the index with the least
-- number of columns so that we can remove more columns from the GROUP BY.
explain (costs off) select a,b,c from t3 group by a,b,c;
QUERY PLAN
----------------------
HashAggregate
Group Key: c
-> Seq Scan on t3
(3 rows)

-- As above but try ordering the columns differently to ensure we get the
-- same result.
explain (costs off) select a,b,c from t3 group by c,a,b;
QUERY PLAN
----------------------
HashAggregate
Group Key: c
-> Seq Scan on t3
(3 rows)

Because the table structure of t3 is defined as follows(Its PK is deferrable):

postgres=# \d+ t3
Table "pg_temp_1.t3"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
a | integer | | not null | | plain | | |
b | integer | | not null | | plain | | |
c | integer | | not null | | plain | | |
Indexes:
"t3_pkey" PRIMARY KEY, btree (a, b) DEFERRABLE
"t3_c_uidx" UNIQUE, btree (c)
Not-null constraints:
"t3_a_not_null" NOT NULL "a"
"t3_b_not_null" NOT NULL "b"
"t3_c_not_null" NOT NULL "c"
Access method: heap

postgres=#

I think this test case does not fully reflect the original meaning. So I made a small change to this and look forward to
your feedback. Thanks!

Hi,

Good catch.

Yeah, the primary key of t3 is deferred, so only the t3_c_uidx can be used.
The test case is inconsistent with comments.

It seems that the t2 does not have a unique index on z, do you forget to
create it in the patch?

While there isn't an unique index, an unique constraint with a NOT NULL on that column should serve the same purpose.

  +alter table t2 alter column z set not null, add constraint t2_z_uidx unique (z);

In patch v2, I made a slight adjustment: creating a unique index on t3 will handle multiple choice scenarios.

Thanks for the patch.

--
Zhang Mingli
HashData

Attachments:

v2-0001-Modify-an-incorrect-regression-test-case-in-the-grou.patchapplication/octet-streamDownload
From c92eba894be3f4d6b238917e6174373f1404a0fc Mon Sep 17 00:00:00 2001
From: songjinzhou <tsinghualucky912@foxmail.com>
Date: Tue, 18 Feb 2025 19:02:36 +0800
Subject: [PATCH] Modify an incorrect regression test case in the group by key
 value elimination function

---
 src/test/regress/expected/aggregates.out | 2 ++
 src/test/regress/sql/aggregates.sql      | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f2fb66388c..ca95ba1343 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1472,6 +1472,7 @@ explain (costs off) select b,c from t3 group by b,c;
 -- When there are multiple supporting unique indexes and the GROUP BY contains
 -- columns to cover all of those, ensure we pick the index with the least
 -- number of columns so that we can remove more columns from the GROUP BY.
+create unique index t3_b_c_uidx on t3(b, c);
 explain (costs off) select a,b,c from t3 group by a,b,c;
       QUERY PLAN      
 ----------------------
@@ -1490,6 +1491,7 @@ explain (costs off) select a,b,c from t3 group by c,a,b;
    ->  Seq Scan on t3
 (3 rows)
 
+drop index t3_b_c_uidx;
 -- Ensure we don't use a partial index as proof of functional dependency
 drop index t3_c_uidx;
 create index t3_c_uidx on t3 (c) where c > 0;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 77168bcc74..8acd64d34e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -520,12 +520,14 @@ explain (costs off) select b,c from t3 group by b,c;
 -- When there are multiple supporting unique indexes and the GROUP BY contains
 -- columns to cover all of those, ensure we pick the index with the least
 -- number of columns so that we can remove more columns from the GROUP BY.
+create unique index t3_b_c_uidx on t3(b, c);
 explain (costs off) select a,b,c from t3 group by a,b,c;
 
 -- As above but try ordering the columns differently to ensure we get the
 -- same result.
 explain (costs off) select a,b,c from t3 group by c,a,b;
 
+drop index t3_b_c_uidx;
 -- Ensure we don't use a partial index as proof of functional dependency
 drop index t3_c_uidx;
 create index t3_c_uidx on t3 (c) where c > 0;
-- 
2.34.1

#4David Rowley
dgrowleyml@gmail.com
In reply to: songjinzhou (#1)
Re: Modify an incorrect regression test case in the group by key value elimination function

On Tue, 18 Feb 2025 at 20:40, songjinzhou <tsinghualucky912@foxmail.com> wrote:

Two months ago, we enhanced the group by key value elimination function. This is a very useful function. When I was learning, I found a regression test case that was not suitable, as follows:

Because the table structure of t3 is defined as follows(Its PK is deferrable):

I think this test case does not fully reflect the original meaning. So I made a small change to this and look forward to your feedback. Thanks!

That's my mistake. Thanks for spotting it.

I looked at your patch and felt that the flow would be better if all
those tests used t2 rather than t3. That way we don't need to add the
NOT NULL constraint to both tables, just t2. I wrote the patch to do
it that way and pushed it.

David