Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.
Hi all,
I find another issue as $SUBJECT when I work on [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang.
I will use [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang SQL to show the problem.
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
ANALYZE pagg_tab;
set max_parallel_workers_per_gather = 0;
postgres=# show enable_partitionwise_join ;
enable_partitionwise_join
---------------------------
off
(1 row)
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on
t1.c = t2.c;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=24915.00..24915.01 rows=1 width=8)
-> Hash Join (cost=97.50..20415.00 rows=1800000 width=0)
Hash Cond: (t1.c = t2.c)
-> Append (cost=0.00..60.00 rows=3000 width=2)
-> Seq Scan on pagg_tab_p2 t1_1 (cost=0.00..9.00 rows=600
width=2)
-> Seq Scan on pagg_tab_p3 t1_2 (cost=0.00..18.00
rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t1_3 (cost=0.00..18.00
rows=1200 width=2)
-> Hash (cost=60.00..60.00 rows=3000 width=2)
-> Append (cost=0.00..60.00 rows=3000 width=2)
-> Seq Scan on pagg_tab_p2 t2_1 (cost=0.00..9.00
rows=600 width=2)
-> Seq Scan on pagg_tab_p3 t2_2 (cost=0.00..18.00
rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t2_3 (cost=0.00..18.00
rows=1200 width=2)
(12 rows)
postgres=# select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c =
t2.c;
count
---------
1800000
(1 row)
But if we set enable_partitionwise_join is on, the result is different, as
below.
postgres=# set enable_partitionwise_join = on;
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on
t1.c = t2.c;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=17010.00..17010.01 rows=1 width=8)
-> Append (cost=16.50..14760.00 rows=900000 width=0)
-> Hash Join (cost=16.50..2052.00 rows=180000 width=0)
Hash Cond: (t1_1.c = t2_1.c)
-> Seq Scan on pagg_tab_p2 t1_1 (cost=0.00..9.00 rows=600
width=2)
-> Hash (cost=9.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p2 t2_1 (cost=0.00..9.00
rows=600 width=2)
-> Hash Join (cost=33.00..4104.00 rows=360000 width=0)
Hash Cond: (t1_2.c = t2_2.c)
-> Seq Scan on pagg_tab_p3 t1_2 (cost=0.00..18.00
rows=1200 width=2)
-> Hash (cost=18.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p3 t2_2 (cost=0.00..18.00
rows=1200 width=2)
-> Hash Join (cost=33.00..4104.00 rows=360000 width=0)
Hash Cond: (t1_3.c = t2_3.c)
-> Seq Scan on pagg_tab_p1 t1_3 (cost=0.00..18.00
rows=1200 width=2)
-> Hash (cost=18.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t2_3 (cost=0.00..18.00
rows=1200 width=2)
(17 rows)
postgres=# select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c =
t2.c;
count
--------
900000
(1 row)
I think the root cause of this thread and [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang are same. We don't use the
Partition Key collation but column's
collation to fill the RelOptInfo partexprs field in
set_baserel_partition_key_exprs().
If the Partition Key definition is same as column definition, which most
times is,
that will be ok. But if it's not, this thread issue will arise.
As far as I know, only partition pruning logic considers not only call
equal(), but also check collation match.
Other codes only call equal() to check if the exprs match the partition key.
For example, in this thread case, match_expr_to_partition_keys() think the
expr match the partition key:
if (equal(lfirst(lc), expr))
return cnt;
Although We can fix this issue like [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang, I think why not directly use the
partkey->partcollation[cnt], which its value is
same with pg_partitioned_table's partcollation. I tried this to fix [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang,
but at that time, I was unsure if it was the correct fix.
Until I find this thread issue, I think we should do it this way.
In the attached patch, I include this thread test and [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang test case.
I have two questions in my head:
1. Does partition pruning logic still check the collation match with this
patch.
2. icu can work on all platform?
Any thoughts?
[1]: /messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang
/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org
--
Thanks,
Tender Wang
Attachments:
0001-Fix-wrong-result-due-different-collation-between-col.patchapplication/octet-stream; name=0001-Fix-wrong-result-due-different-collation-between-col.patchDownload
From 2d584f9c67ce6d4e4631f4d9358975bc8ce9e85e Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Wed, 23 Oct 2024 21:41:40 +0800
Subject: [PATCH] Fix wrong result due different collation between columan and
partition key.
---
src/backend/optimizer/util/plancat.c | 2 +-
.../regress/expected/partition_aggregate.out | 75 +++++++++++++++++++
src/test/regress/sql/partition_aggregate.sql | 31 ++++++++
3 files changed, 107 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b913f91ff0..a96a4396f6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2537,7 +2537,7 @@ set_baserel_partition_key_exprs(Relation relation,
partexpr = (Expr *) makeVar(varno, attno,
partkey->parttypid[cnt],
partkey->parttypmod[cnt],
- partkey->parttypcoll[cnt], 0);
+ partkey->partcollation[cnt], 0);
}
else
{
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf578..4f10066c48 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -1518,3 +1518,78 @@ SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) <
21 | 6000 | 6.0000000000000000 | 1000
(6 rows)
+SET enable_partitionwise_aggregate TO true;
+SET enable_partitionwise_join TO true;
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+CREATE COLLATION case_insensitive (
+ provider = icu,
+ locale = 'und-u-ks-level2',
+ deterministic = false
+);
+CREATE TABLE pagg_tab3 (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b', 'c', 'd');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('e', 'f', 'A');
+CREATE TABLE pagg_tab3_p3 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'C', 'D', 'E');
+INSERT INTO pagg_tab3 SELECT substr('abcdeABCDE', (i % 10) + 1 , 1) FROM generate_series(0, 2999) i;
+ANALYZE pagg_tab3;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> Finalize GroupAggregate
+ Group Key: pagg_tab3.c
+ -> Sort
+ Sort Key: pagg_tab3.c COLLATE case_insensitive
+ -> Append
+ -> Partial HashAggregate
+ Group Key: pagg_tab3.c
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> Partial HashAggregate
+ Group Key: pagg_tab3_1.c
+ -> Seq Scan on pagg_tab3_p3 pagg_tab3_1
+ -> Partial HashAggregate
+ Group Key: pagg_tab3_2.c
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(16 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 600
+ B | 600
+ C | 600
+ D | 600
+ E | 600
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 on t1.c = t2.c;
+ QUERY PLAN
+-------------------------------------------------------
+ Aggregate
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p3 t1_2
+ -> Seq Scan on pagg_tab3_p1 t1_3
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p3 t2_2
+ -> Seq Scan on pagg_tab3_p1 t2_3
+(12 rows)
+
+SELECT count(*) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 on t1.c = t2.c;
+ count
+---------
+ 1800000
+(1 row)
+
+RESET enable_partitionwise_aggregate;
+RESET enable_partitionwise_join;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee24..3c0c65de67 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -334,3 +334,34 @@ RESET parallel_setup_cost;
EXPLAIN (COSTS OFF)
SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
+
+SET enable_partitionwise_aggregate TO true;
+SET enable_partitionwise_join TO true;
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+
+CREATE COLLATION case_insensitive (
+ provider = icu,
+ locale = 'und-u-ks-level2',
+ deterministic = false
+);
+
+CREATE TABLE pagg_tab3 (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b', 'c', 'd');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('e', 'f', 'A');
+CREATE TABLE pagg_tab3_p3 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'C', 'D', 'E');
+INSERT INTO pagg_tab3 SELECT substr('abcdeABCDE', (i % 10) + 1 , 1) FROM generate_series(0, 2999) i;
+ANALYZE pagg_tab3;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 on t1.c = t2.c;
+
+SELECT count(*) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 on t1.c = t2.c;
+
+RESET enable_partitionwise_aggregate;
+RESET enable_partitionwise_join;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
\ No newline at end of file
--
2.25.1
Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道:
Hi all,
I think the root cause of this thread and [1] are same. We don't use the
Partition Key collation but column's
collation to fill the RelOptInfo partexprs field in
set_baserel_partition_key_exprs().
If the Partition Key definition is same as column definition, which most
times is,
that will be ok. But if it's not, this thread issue will arise.As far as I know, only partition pruning logic considers not only call
equal(), but also check collation match.
Other codes only call equal() to check if the exprs match the partition
key.
For example, in this thread case, match_expr_to_partition_keys() think the
expr match the partition key:
if (equal(lfirst(lc), expr))
return cnt;Although We can fix this issue like [1], I think why not directly use the
partkey->partcollation[cnt], which its value is
same with pg_partitioned_table's partcollation. I tried this to fix [1],
but at that time, I was unsure if it was the correct fix.Until I find this thread issue, I think we should do it this way.
In the attached patch, I include this thread test and [1] test case.
Hi
In the last email, I proposed to use partcollation directly. But I ignore
the case that partkey is not a simple column reference.
for expample:
create table coll_pruning_multi (a text) partition by range (substr(a,
1) collate "POSIX", substr(a, 1) collate "C");
The partexprs in RelOptInfo store substr(a,1), a FuncExpr, and its
collation is same with column a not collate "posix".
I'm now thinking how we can use a uniform solution to fix this thread issue
and issue in [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang
[1]: /messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang
/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道:
Hi all,
I find another issue as $SUBJECT when I work on [1].
When I continue to work on this, I find below issue. But I'm not sure
whether it is a bug.
postgres=# create table part_index(a text primary key) partition by list (
a collate "POSIX");
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
postgres=# create table part_index(a text) partition by list ( a collate
"POSIX");
CREATE TABLE
postgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
It seems we can't create a primary key if the collation is different
between columnDef and PartitionKey.
By the way, I think the error message is misleading to users.
ostgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
--
Thanks,
Tender Wang
Hi,
On Thu, Oct 24, 2024 at 1:46 PM Tender Wang <tndrwang@gmail.com> wrote:
Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道:
Hi all,
I find another issue as $SUBJECT when I work on [1].
When I continue to work on this, I find below issue. But I'm not sure whether it is a bug.
postgres=# create table part_index(a text primary key) partition by list ( a collate "POSIX");
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a" which is part of the partition key.
postgres=# create table part_index(a text) partition by list ( a collate "POSIX");
CREATE TABLE
postgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a" which is part of the partition key.It seems we can't create a primary key if the collation is different between columnDef and PartitionKey.
Yeah, you don't want to have the PK index and the partitioning logic
to not be in sync about the collation rules applied to the individual
rows.
By the way, I think the error message is misleading to users.
ostgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a" which is part of the partition key.
I think it's kind of similar to the message you get when a GROUP BY
column's collation doesn't match the column appearing in the SELECT
list:
explain SELECT c collate case_insensitive, count(c) FROM
pagg_tab_case_s GROUP BY c collate "C";
ERROR: column "pagg_tab_case_s.c" must appear in the GROUP BY clause
or be used in an aggregate function
LINE 1: explain SELECT c collate case_insensitive, count(c) FROM pag...
Perhaps it would be more helpful for the error message or hint or
detail to mention the actual discrepancy (collation mismatch) that's
causing the error.
There might be other instances of such an error and I am not sure it
would be worthwhile to find and fix them all.
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年10月24日周四 14:33写道:
Hi,
On Thu, Oct 24, 2024 at 1:46 PM Tender Wang <tndrwang@gmail.com> wrote:
Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道:
Hi all,
I find another issue as $SUBJECT when I work on [1].
When I continue to work on this, I find below issue. But I'm not sure
whether it is a bug.
postgres=# create table part_index(a text primary key) partition by list
( a collate "POSIX");
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
postgres=# create table part_index(a text) partition by list ( a collate
"POSIX");
CREATE TABLE
postgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include allpartitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
It seems we can't create a primary key if the collation is different
between columnDef and PartitionKey.
Yeah, you don't want to have the PK index and the partitioning logic
to not be in sync about the collation rules applied to the individual
rows.By the way, I think the error message is misleading to users.
ostgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include allpartitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
I think it's kind of similar to the message you get when a GROUP BY
column's collation doesn't match the column appearing in the SELECT
list:explain SELECT c collate case_insensitive, count(c) FROM
pagg_tab_case_s GROUP BY c collate "C";
ERROR: column "pagg_tab_case_s.c" must appear in the GROUP BY clause
or be used in an aggregate function
LINE 1: explain SELECT c collate case_insensitive, count(c) FROM pag...Perhaps it would be more helpful for the error message or hint or
detail to mention the actual discrepancy (collation mismatch) that's
causing the error.There might be other instances of such an error and I am not sure it
would be worthwhile to find and fix them all.
Thanks for the explanation. We had better focus on the wrong result issue.
I feel that it's hard only to use one struct(for example, X), which just
calls equal(X, expr)
can check both the expression match and the collation match.
Maybe we should add another collation match checks in
match_clause_to_partition_key(), like
partition pruning logic does.
Any thoughts?
--
Thanks,
Tender Wang
On Thu, Oct 24, 2024 at 3:01 PM Tender Wang <tndrwang@gmail.com> wrote:
I feel that it's hard only to use one struct(for example, X), which just calls equal(X, expr)
can check both the expression match and the collation match.
in RelOptInfo->partexprs, maybe we should mention that the partition
key collation is stored
in RelOptInfo->part_scheme, not here.
Maybe we should add another collation match checks in match_clause_to_partition_key(), like
partition pruning logic does.
in match_clause_to_partition_key
we already have
else if (IsA(clause, OpExpr) &&
list_length(((OpExpr *) clause)->args) == 2)
{
/*
* Partition key match also requires collation match. There may be
* multiple partkeys with the same expression but different
* collations, so failure is NOMATCH.
*/
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
if (!equal(leftop, partkey) ||
!PartCollMatchesExprColl(partcoll, saop->inputcollid))
return PARTCLAUSE_NOMATCH;
}
So I think match_clause_to_partition_key handling collation is fine.
I think the problem is match_expr_to_partition_keys
don't have a collation related check.
CREATE TABLE pagg_join1 (c text collate case_insensitive) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_join2 (c text collate "C") PARTITION BY LIST(c
collate case_insensitive);
CREATE TABLE pagg_join3 (c text collate "POSIX") PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_join4 (c text collate case_insensitive) PARTITION BY
LIST(c collate ignore_accents);
Our partition-wise join is based on Equi-join [1]https://en.wikipedia.org/wiki/Join_%28SQL%29#Equi-join.
In some cases,column and partitionkey collation are different,
but if these two collations are deterministic, then texteq should work
as expected.
So I think, pagg_join3 can do partition-wise join,
I think pagg_join2 can do partition-wise join also.
we can let all (pagg_join1, pagg_join2, pagg_join3, pagg_join4) cannot
do partition-wise join (join with themself),
or we can let pagg_join2, pagg_join3 do partition-wise join (join with
themself).
POC attached, will let pagg_join2, pagg_join3 do partition-wise join.
Attachments:
relnode.diffapplication/x-patch; name=relnode.diffDownload
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..6214d01794 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -74,7 +74,7 @@ static bool have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *rel1, RelOptInfo *rel2,
JoinType jointype, List *restrictlist);
static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
- bool strict_op);
+ bool strict_op, bool *coll_inderministic);
static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
JoinType jointype);
@@ -2104,6 +2104,7 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
Expr *expr1;
Expr *expr2;
bool strict_op;
+ bool coll_inderministic = false;
int ipk1;
int ipk2;
@@ -2167,10 +2168,11 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
* Only clauses referencing the partition keys are useful for
* partitionwise join.
*/
- ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
+ ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op, &coll_inderministic);
if (ipk1 < 0)
continue;
- ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
+
+ ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op, &coll_inderministic);
if (ipk2 < 0)
continue;
@@ -2181,6 +2183,10 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (ipk1 != ipk2)
continue;
+ /* if either collation is inderministic, cannot do partitionwise join */
+ if (coll_inderministic)
+ return false;
+
/* Ignore clause if we already proved these keys equal. */
if (pk_known_equal[ipk1])
continue;
@@ -2296,9 +2302,12 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
* strict_op must be true if the expression will be compared with the
* partition key using a strict operator. This allows us to consider
* nullable as well as nonnullable partition keys.
+ * coll_inderministic return true if exprCollation(expr) is inderministic. if
+ * expr is inderministic, that means same value with different apperance can
+ * live in different partition. In that case, we cannot do partition-wise join.
*/
static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op, bool *coll_inderministic)
{
int cnt;
@@ -2319,7 +2328,15 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
foreach(lc, rel->partexprs[cnt])
{
if (equal(lfirst(lc), expr))
+ {
+ Oid colloid = exprCollation((Node *) expr);
+
+ if (OidIsValid(colloid) &&
+ !get_collation_isdeterministic(colloid))
+ *coll_inderministic = true;
+
return cnt;
+ }
}
if (!strict_op)
@@ -2335,7 +2352,14 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
foreach(lc, rel->nullable_partexprs[cnt])
{
if (equal(lfirst(lc), expr))
+ {
+ Oid colloid = exprCollation((Node *) expr);
+
+ if (OidIsValid(colloid) &&
+ !get_collation_isdeterministic(colloid))
+ *coll_inderministic = true;
return cnt;
+ }
}
}
jian he <jian.universality@gmail.com> 于2024年10月29日周二 14:15写道:
On Thu, Oct 24, 2024 at 3:01 PM Tender Wang <tndrwang@gmail.com> wrote:
I feel that it's hard only to use one struct(for example, X), which just
calls equal(X, expr)
can check both the expression match and the collation match.
in RelOptInfo->partexprs, maybe we should mention that the partition
key collation is stored
in RelOptInfo->part_scheme, not here.Maybe we should add another collation match checks in
match_clause_to_partition_key(), like
partition pruning logic does.
in match_clause_to_partition_key
we already haveelse if (IsA(clause, OpExpr) &&
list_length(((OpExpr *) clause)->args) == 2)
{
/*
* Partition key match also requires collation match. There may be
* multiple partkeys with the same expression but different
* collations, so failure is NOMATCH.
*/
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
if (!equal(leftop, partkey) ||
!PartCollMatchesExprColl(partcoll, saop->inputcollid))
return PARTCLAUSE_NOMATCH;
}
So I think match_clause_to_partition_key handling collation is fine.I think the problem is match_expr_to_partition_keys
don't have a collation related check.
Sorry, it's a typo. It should be match_expr_to_partition_keys().
CREATE TABLE pagg_join1 (c text collate case_insensitive) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_join2 (c text collate "C") PARTITION BY LIST(c
collate case_insensitive);
CREATE TABLE pagg_join3 (c text collate "POSIX") PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_join4 (c text collate case_insensitive) PARTITION BY
LIST(c collate ignore_accents);Our partition-wise join is based on Equi-join [1].
In some cases,column and partitionkey collation are different,
but if these two collations are deterministic, then texteq should work
as expected.
So I think, pagg_join3 can do partition-wise join,
I think pagg_join2 can do partition-wise join also.we can let all (pagg_join1, pagg_join2, pagg_join3, pagg_join4) cannot
do partition-wise join (join with themself),
or we can let pagg_join2, pagg_join3 do partition-wise join (join with
themself).POC attached, will let pagg_join2, pagg_join3 do partition-wise join.
Hmm, I'm not sure
--
Thanks,
Tender Wang
I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.
have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.
the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.
Attachments:
v2-0001-partition_wise_join_collation_check.difftext/x-patch; charset=US-ASCII; name=v2-0001-partition_wise_join_collation_check.diffDownload
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..428751b05f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -74,7 +74,7 @@ static bool have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *rel1, RelOptInfo *rel2,
JoinType jointype, List *restrictlist);
static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
- bool strict_op);
+ bool strict_op, bool *coll_incompatiable);
static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
JoinType jointype);
@@ -2104,6 +2104,7 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
Expr *expr1;
Expr *expr2;
bool strict_op;
+ bool coll_incompatiable = false;
int ipk1;
int ipk2;
@@ -2167,10 +2168,11 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
* Only clauses referencing the partition keys are useful for
* partitionwise join.
*/
- ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
+ ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op, &coll_incompatiable);
if (ipk1 < 0)
continue;
- ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
+
+ ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op, &coll_incompatiable);
if (ipk2 < 0)
continue;
@@ -2181,6 +2183,15 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (ipk1 != ipk2)
continue;
+ /*
+ * we generally assume parttion key and expr's collation are fine for
+ * partition-wise join. forgidden case is column collation is
+ * indeterministic and partition key's collation not same as column's.
+ * see match_expr_to_partition_keys also.
+ */
+ if (coll_incompatiable)
+ return false;
+
/* Ignore clause if we already proved these keys equal. */
if (pk_known_equal[ipk1])
continue;
@@ -2296,9 +2307,15 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
* strict_op must be true if the expression will be compared with the
* partition key using a strict operator. This allows us to consider
* nullable as well as nonnullable partition keys.
+ * if exprCollation(expr) is inderministic also not equal to partcollation,
+ * that means same value with different apperances can live in different
+ * partition, coll_incompatiable return set to true. In that case, we cannot do
+ * partition-wise join. we are OK with expression's collation same as partition
+ * key's even though they are indeterministic.
+ *
*/
static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op, bool *coll_incompatiable)
{
int cnt;
@@ -2315,11 +2332,22 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
{
ListCell *lc;
+ Oid partcoll = rel->part_scheme->partcollation[cnt];
+
/* We can always match to the non-nullable partition keys. */
foreach(lc, rel->partexprs[cnt])
{
if (equal(lfirst(lc), expr))
+ {
+ Oid colloid = exprCollation((Node *) expr);
+
+ if ((partcoll != colloid) &&
+ OidIsValid(colloid) &&
+ !get_collation_isdeterministic(colloid))
+ *coll_incompatiable = true;
+
return cnt;
+ }
}
if (!strict_op)
@@ -2335,7 +2363,15 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
foreach(lc, rel->nullable_partexprs[cnt])
{
if (equal(lfirst(lc), expr))
+ {
+ Oid colloid = exprCollation((Node *) expr);
+
+ if ((partcoll != colloid) &&
+ OidIsValid(colloid) &&
+ !get_collation_isdeterministic(colloid))
+ *coll_incompatiable = true;
return cnt;
+ }
}
}
On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:
I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.
What if the partkey collation and column collation are both deterministic,
but with different sort order?
I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of column collation,
because it's the partkey collation that decides which partition a row to
be dispatched.
What Jian proposed is also reasonable but seems another aspect of $subject?
Just some random thought, might be wrong ;(
--
Regards
Junwang Zhao
On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:
I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.What if the partkey collation and column collation are both deterministic,
but with different sort order?I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of column collation,
because it's the partkey collation that decides which partition a row to
be dispatched.What Jian proposed is also reasonable but seems another aspect of $subject?
I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.
+ {
+ Oid colloid = exprCollation((Node *) expr);
+
+ if ((partcoll != colloid) &&
+ OidIsValid(colloid) &&
+ !get_collation_isdeterministic(colloid))
+ *coll_incompatiable = true;
I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.
Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.
0001 is the patch for the partitionwise grouping issue (bug #18568).
I concluded that even partial aggregate should be disallowed when the
grouping collation doesn't match partitioning collation. The result
with partial partitionwise grouping is not the same as when
enable_partitionwise_aggregate is off.
--
Thanks, Amit Langote
Attachments:
v3-0001-Disallow-partitionwise-grouping-when-collation-do.patchapplication/octet-stream; name=v3-0001-Disallow-partitionwise-grouping-when-collation-do.patchDownload
From 03cb00c55fef0f194c3c585f43f004d0a74542f5 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 31 Oct 2024 21:59:28 +0900
Subject: [PATCH v3 1/2] Disallow partitionwise grouping when collation doesn't
match
Insist that the collation used for grouping matches exactly with the
collation used for partitioning to allow using either full or
partial partitionwise grouping.
Bug: #18568
Reported-by: Webbo Han <1105066510@qq.com>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/plan/planner.c | 53 +++++++++---
.../regress/expected/collate.icu.utf8.out | 85 +++++++++++++++++++
.../regress/expected/partition_aggregate.out | 11 ---
src/test/regress/sql/collate.icu.utf8.sql | 35 ++++++++
src/test/regress/sql/partition_aggregate.sql | 1 -
5 files changed, 163 insertions(+), 22 deletions(-)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..60bb86a99c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -253,7 +253,8 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
GroupPathExtraData *extra);
static bool group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause);
+ List *groupClause,
+ bool *collation_incompatible);
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
@@ -4090,6 +4091,12 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
IS_PARTITIONED_REL(input_rel))
{
+ bool collation_incompatible = false;
+ bool group_by_contains_partkey =
+ group_by_has_partkey(input_rel, extra->targetList,
+ root->parse->groupClause,
+ &collation_incompatible);
+
/*
* If this is the topmost relation or if the parent relation is doing
* full partitionwise aggregation, then we can do full partitionwise
@@ -4103,10 +4110,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
* okay if some of the partitioning columns were proved redundant.
*/
if (extra->patype == PARTITIONWISE_AGGREGATE_FULL &&
- group_by_has_partkey(input_rel, extra->targetList,
- root->parse->groupClause))
+ group_by_contains_partkey)
patype = PARTITIONWISE_AGGREGATE_FULL;
- else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
+ else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0 &&
+ !collation_incompatible)
patype = PARTITIONWISE_AGGREGATE_PARTIAL;
else
patype = PARTITIONWISE_AGGREGATE_NONE;
@@ -8105,13 +8112,14 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
/*
* group_by_has_partkey
*
- * Returns true, if all the partition keys of the given relation are part of
- * the GROUP BY clauses, false otherwise.
+ * Returns true if all the partition keys of the given relation are part of
+ * the GROUP BY clauses, including having matching collation, false otherwise.
*/
static bool
group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause)
+ List *groupClause,
+ bool *collation_incompatible)
{
List *groupexprs = get_sortgrouplist_exprs(groupClause, targetList);
int cnt = 0;
@@ -8134,13 +8142,38 @@ group_by_has_partkey(RelOptInfo *input_rel,
foreach(lc, partexprs)
{
+ ListCell *lg;
Expr *partexpr = lfirst(lc);
+ Oid partcoll = input_rel->part_scheme->partcollation[cnt];
- if (list_member(groupexprs, partexpr))
+ foreach (lg, groupexprs)
{
- found = true;
- break;
+ Expr *groupexpr = lfirst(lg);
+ Oid groupcoll = exprCollation((Node *) groupexpr);
+
+ if (IsA(groupexpr, RelabelType))
+ groupexpr = ((RelabelType *) groupexpr)->arg;
+
+ if (OidIsValid(partcoll) && OidIsValid(groupcoll) &&
+ partcoll != groupcoll)
+ {
+ *collation_incompatible = true;
+ return false;
+ }
+
+ /*
+ * Ensure that the grouping collation is compatible with the
+ * partitioning collation.
+ */
+ if (equal(groupexpr, partexpr))
+ {
+ found = true;
+ break;
+ }
}
+
+ if (found)
+ break;
}
/*
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index faa376e060..3d6d8f9a20 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2054,6 +2054,91 @@ SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
t
(1 row)
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+CREATE TABLE pagg_tab3 (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT substr('abAB', (i % 4) +1 , 1) FROM generate_series(0, 7) i;
+ANALYZE pagg_tab3;
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 4
+ B | 4
+(2 rows)
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 4
+ B | 4
+(2 rows)
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------
+ Sort
+ Sort Key: ((pagg_tab3.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (pagg_tab3.c)::text
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> HashAggregate
+ Group Key: (pagg_tab3_1.c)::text
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_1
+(9 rows)
+
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 2
+ B | 2
+ a | 2
+ b | 2
+(4 rows)
+
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
SET client_min_messages TO warning;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf578..670eb98906 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -1507,14 +1507,3 @@ SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) <
-> Seq Scan on pagg_tab_para_p3 pagg_tab_para_2
(15 rows)
-SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
- x | sum | avg | count
-----+------+--------------------+-------
- 0 | 5000 | 5.0000000000000000 | 1000
- 1 | 6000 | 6.0000000000000000 | 1000
- 10 | 5000 | 5.0000000000000000 | 1000
- 11 | 6000 | 6.0000000000000000 | 1000
- 20 | 5000 | 5.0000000000000000 | 1000
- 21 | 6000 | 6.0000000000000000 | 1000
-(6 rows)
-
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 80f28a97d7..eaf9a99be7 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -796,6 +796,41 @@ INSERT INTO test33 VALUES (2, 'DEF');
-- they end up in the same partition (but it's platform-dependent which one)
SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+
+CREATE TABLE pagg_tab3 (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT substr('abAB', (i % 4) +1 , 1) FROM generate_series(0, 7) i;
+ANALYZE pagg_tab3;
+
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee24..1e263c1caf 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -333,4 +333,3 @@ RESET parallel_setup_cost;
EXPLAIN (COSTS OFF)
SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
-SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
--
2.43.0
v3-0002-Disallow-partitionwise-join-when-collation-doesn-.patchapplication/octet-stream; name=v3-0002-Disallow-partitionwise-join-when-collation-doesn-.patchDownload
From ccf9f6cfe2a8078961ce2250ec56884ac12b6e01 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 31 Oct 2024 21:58:30 +0900
Subject: [PATCH v3 2/2] Disallow partitionwise join when collation doesn't
match
Insist that the collation used for joining matches exactly with the
collation used for partitioning to allow using partitionwise join.
Reported-by: Tender Wang <tndrwang@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/util/relnode.c | 26 +++++-
.../regress/expected/collate.icu.utf8.out | 89 +++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 17 ++++
3 files changed, 130 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..0dfcd3a556 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2258,6 +2258,8 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
{
Node *expr1 = (Node *) lfirst(lc);
ListCell *lc2;
+ Oid partcoll1 = rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll1 = exprCollation(expr1);
foreach(lc2, rel2->partexprs[ipk])
{
@@ -2265,8 +2267,18 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
- pk_known_equal[ipk] = true;
- break;
+ Oid partcoll2 = rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll2 = exprCollation(expr2);
+
+ /*
+ * Ensure that the collations match those of the partition
+ * keys.
+ */
+ if (partcoll1 == exprcoll1 && partcoll2 == exprcoll2)
+ {
+ pk_known_equal[ipk] = true;
+ break;
+ }
}
}
if (pk_known_equal[ipk])
@@ -2301,6 +2313,7 @@ static int
match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
{
int cnt;
+ Oid exprcoll = exprCollation((Node *) expr);
/* This function should be called only for partitioned relations. */
Assert(rel->part_scheme);
@@ -2314,10 +2327,15 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
{
ListCell *lc;
+ Oid partcoll = rel->part_scheme->partcollation[cnt];
/* We can always match to the non-nullable partition keys. */
foreach(lc, rel->partexprs[cnt])
{
+ if (OidIsValid(partcoll) && OidIsValid(exprcoll) &&
+ partcoll != exprcoll)
+ return -1;
+
if (equal(lfirst(lc), expr))
return cnt;
}
@@ -2334,6 +2352,10 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
*/
foreach(lc, rel->nullable_partexprs[cnt])
{
+ if (OidIsValid(partcoll) && OidIsValid(exprcoll) &&
+ partcoll != exprcoll)
+ return -1;
+
if (equal(lfirst(lc), expr))
return cnt;
}
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 3d6d8f9a20..56239737cd 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2136,6 +2136,95 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
b | 2
(4 rows)
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 16
+ B | 16
+(2 rows)
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 16
+ B | 16
+(2 rows)
+
+-- OK when the join key uses the same collation.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+ QUERY PLAN
+------------------------------------------------------------------
+ Sort
+ Sort Key: ((t1.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (t1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1.c)::text = (t2.c)::text)
+ -> Seq Scan on pagg_tab3_p2 t1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p2 t2
+ -> HashAggregate
+ Group Key: (t1_1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+ -> Seq Scan on pagg_tab3_p1 t1_1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p1 t2_1
+(17 rows)
+
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 4
+ B | 4
+ a | 4
+ b | 4
+(4 rows)
+
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index eaf9a99be7..b5f872742b 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -828,6 +828,23 @@ EXPLAIN (COSTS OFF)
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+
+-- OK when the join key uses the same collation.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
--
2.43.0
Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道:
On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com>
wrote:
I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.What if the partkey collation and column collation are both
deterministic,
but with different sort order?
I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of columncollation,
because it's the partkey collation that decides which partition a row to
be dispatched.What Jian proposed is also reasonable but seems another aspect of
$subject?
I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.+ { + Oid colloid = exprCollation((Node *) expr); + + if ((partcoll != colloid) && + OidIsValid(colloid) && + !get_collation_isdeterministic(colloid)) + *coll_incompatiable = true;I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.
Me, too.
Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
Agree.
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.
In have_partkey_equi_join()
...
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
Oid partcoll2 = rel1->part_scheme->partcollation[ipk];
....
I think we should use rel2 here, not rel1.
--
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道:
0001 is the patch for the partitionwise grouping issue (bug #18568).
I concluded that even partial aggregate should be disallowed when the
grouping collation doesn't match partitioning collation. The result
with partial partitionwise grouping is not the same as when
enable_partitionwise_aggregate is off.
LGTM
--
Thanks,
Tender Wang
On Thu, Oct 31, 2024 at 9:09 PM Amit Langote <amitlangote09@gmail.com> wrote:
I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.+ { + Oid colloid = exprCollation((Node *) expr); + + if ((partcoll != colloid) && + OidIsValid(colloid) && + !get_collation_isdeterministic(colloid)) + *coll_incompatiable = true;I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.
i was thinking that
CREATE TABLE part_tab (c text collate "POSIX") PARTITION BY LIST(c collate "C");
maybe can do partitionwise join.
join key collation and the partition key collation same sure would
make things easy.
about 0002.
Similar to PartCollMatchesExprColl in match_clause_to_partition_key
I think we can simply do the following:
no need to hack match_expr_to_partition_keys.
@@ -2181,6 +2181,9 @@ have_partkey_equi_join(PlannerInfo *root,
RelOptInfo *joinrel,
if (ipk1 != ipk2)
continue;
+ if (rel1->part_scheme->partcollation[ipk1] !=
opexpr->inputcollid)
+ return false;
Hi,
On Fri, Nov 1, 2024 at 11:39 AM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道:
On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:
In have_partkey_equi_join()
...
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
Oid partcoll2 = rel1->part_scheme->partcollation[ipk];
....
I think we should use rel2 here, not rel1.
Hmm, yeah, this should be fixed. Though, this is not a problem
because both rel1 and rel2 would be pointing to the same
PartitionScheme, because otherwise we wouldn't be in
have_partkey_equi_join(). See this in build_joinrel_partition_info():
/*
* We can only consider this join as an input to further partitionwise
* joins if (a) the input relations are partitioned and have
* consider_partitionwise_join=true, (b) the partition schemes match, and
* (c) we can identify an equi-join between the partition keys. Note that
* if it were possible for have_partkey_equi_join to return different
* answers for the same joinrel depending on which join ordering we try
* first, this logic would break. That shouldn't happen, though, because
* of the way the query planner deduces implied equalities and reorders
* the joins. Please see optimizer/README for details.
*/
if (outer_rel->part_scheme == NULL || inner_rel->part_scheme == NULL ||
!outer_rel->consider_partitionwise_join ||
!inner_rel->consider_partitionwise_join ||
outer_rel->part_scheme != inner_rel->part_scheme ||
!have_partkey_equi_join(root, joinrel, outer_rel, inner_rel,
sjinfo->jointype, restrictlist))
{
Assert(!IS_PARTITIONED_REL(joinrel));
return;
}
I've changed the condition to match only partcoll1 and exprcoll1, and
if they match, Assert that partcoll2 and exprcoll2 match too. That's
because partcoll1 and partcoll2 would be the same as they are from the
same PartitionScheme and expr1 and expr2 are known equal() so their
collations should match too.
--
Thanks, Amit Langote
Hi,
On Fri, Nov 1, 2024 at 2:39 PM jian he <jian.universality@gmail.com> wrote:
On Thu, Oct 31, 2024 at 9:09 PM Amit Langote <amitlangote09@gmail.com> wrote:
I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.+ { + Oid colloid = exprCollation((Node *) expr); + + if ((partcoll != colloid) && + OidIsValid(colloid) && + !get_collation_isdeterministic(colloid)) + *coll_incompatiable = true;I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.i was thinking that
CREATE TABLE part_tab (c text collate "POSIX") PARTITION BY LIST(c collate "C");
maybe can do partitionwise join.
join key collation and the partition key collation same sure would
make things easy.
I think that's maybe ok to do as a new feature (use partitionwise join
even if collations differ but are both deterministic?), but we should
take a more restrictive approach in a bug fix that is to be
back-patched.
about 0002.
Similar to PartCollMatchesExprColl in match_clause_to_partition_key
I think we can simply do the following:
no need to hack match_expr_to_partition_keys.@@ -2181,6 +2181,9 @@ have_partkey_equi_join(PlannerInfo *root,
RelOptInfo *joinrel,
if (ipk1 != ipk2)
continue;+ if (rel1->part_scheme->partcollation[ipk1] != opexpr->inputcollid) + return false;
Yes, looks like that should be enough, thanks.
I've updated the patch. I've added another test case to test the new
collation matching code in the code block of have_partkey_equi_join()
that pairs partition keys via equivalence class.
Adding Ashutosh to cc, as the original author of this code, to get his
thoughts on these fixes.
--
Thanks, Amit Langote
Attachments:
v4-0002-Disallow-partitionwise-join-when-collation-doesn-.patchapplication/octet-stream; name=v4-0002-Disallow-partitionwise-join-when-collation-doesn-.patchDownload
From b9cd75e360e47aeed008d8060473c617b0f6deb1 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Nov 2024 15:32:33 +0900
Subject: [PATCH v4 2/2] Disallow partitionwise join when collation doesn't
match
Insist that the collation used for joining matches exactly with the
collation used for partitioning to allow using partitionwise join.
Reported-by: Tender Wang <tndrwang@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/util/relnode.c | 28 +++-
.../regress/expected/collate.icu.utf8.out | 154 ++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 35 ++++
3 files changed, 215 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..489fe6c26f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2185,6 +2185,10 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (pk_known_equal[ipk1])
continue;
+ /* Reject if the partition key collation differs from the clause's. */
+ if (rel1->part_scheme->partcollation[ipk1] != opexpr->inputcollid)
+ return false;
+
/*
* The clause allows partitionwise join only if it uses the same
* operator family as that specified by the partition key.
@@ -2258,6 +2262,8 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
{
Node *expr1 = (Node *) lfirst(lc);
ListCell *lc2;
+ Oid partcoll1 = rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll1 = exprCollation(expr1);
foreach(lc2, rel2->partexprs[ipk])
{
@@ -2265,8 +2271,26 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
- pk_known_equal[ipk] = true;
- break;
+ /*
+ * Ensure that the collation of the expression matches
+ * that of the partition key. Checking just one collation
+ * (partcoll1 and exprcoll1) suffices because partcoll1
+ * and partcoll2, as well as exprcoll1 and exprcoll2,
+ * should be identical. This holds because both rel1 and
+ * rel2 use the same PartitionScheme and expr1 and expr2
+ * are equal.
+ */
+ if (partcoll1 == exprcoll1)
+ {
+ Oid partcoll2 PG_USED_FOR_ASSERTS_ONLY =
+ rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll2 PG_USED_FOR_ASSERTS_ONLY =
+ exprCollation(expr2);
+
+ Assert(partcoll2 == exprcoll2);
+ pk_known_equal[ipk] = true;
+ break;
+ }
}
}
if (pk_known_equal[ipk])
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 737cf363a2..abc3ce23a4 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2136,7 +2136,161 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
b | 2
(4 rows)
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key collation.
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 16
+ B | 16
+(2 rows)
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 16
+ B | 16
+(2 rows)
+
+-- OK when the join clause uses the same collation as the partition key.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+ QUERY PLAN
+------------------------------------------------------------------
+ Sort
+ Sort Key: ((t1.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (t1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1.c)::text = (t2.c)::text)
+ -> Seq Scan on pagg_tab3_p2 t1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p2 t2
+ -> HashAggregate
+ Group Key: (t1_1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+ -> Seq Scan on pagg_tab3_p1 t1_1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p1 t2_1
+(17 rows)
+
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 4
+ B | 4
+ a | 4
+ b | 4
+(4 rows)
+
+-- Another case where the partition keys are matched via equivalence class,
+-- not a join restriction clause.
+CREATE TABLE pagg_tab4 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST(b collate "C");
+CREATE TABLE pagg_tab4_p1 PARTITION OF pagg_tab4 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 7) i;
+ANALYZE pagg_tab4;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab4_p2 t2_1
+ Filter: (c = b)
+ -> Seq Scan on pagg_tab4_p1 t2_2
+ Filter: (c = b)
+(15 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 16
+ B | 16
+(2 rows)
+
+-- OK when the join clause uses the same collation as the partition key
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Append
+ -> Hash Join
+ Hash Cond: ((t1_1.c = t2_1.c) AND ((t1_1.c)::text = (t2_1.b)::text))
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Hash
+ -> Seq Scan on pagg_tab4_p2 t2_1
+ -> Hash Join
+ Hash Cond: ((t1_2.c = t2_2.c) AND ((t1_2.c)::text = (t2_2.b)::text))
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Seq Scan on pagg_tab4_p1 t2_2
+(15 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 8
+ B | 8
+(2 rows)
+
DROP TABLE pagg_tab3;
+DROP TABLE pagg_tab4;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index ca1bb7c1f2..2f37a1a97b 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -828,7 +828,42 @@ EXPLAIN (COSTS OFF)
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key collation.
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+
+-- OK when the join clause uses the same collation as the partition key.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+
+-- Another case where the partition keys are matched via equivalence class,
+-- not a join restriction clause.
+CREATE TABLE pagg_tab4 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST(b collate "C");
+CREATE TABLE pagg_tab4_p1 PARTITION OF pagg_tab4 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 7) i;
+ANALYZE pagg_tab4;
+
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+
+-- OK when the join clause uses the same collation as the partition key
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+
DROP TABLE pagg_tab3;
+DROP TABLE pagg_tab4;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
--
2.43.0
v4-0001-Disallow-partitionwise-grouping-when-collation-do.patchapplication/octet-stream; name=v4-0001-Disallow-partitionwise-grouping-when-collation-do.patchDownload
From d8f6753eeb70349ccb45d21dc741dbe6fbaaf2a5 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Nov 2024 16:15:50 +0900
Subject: [PATCH v4 1/2] Disallow partitionwise grouping when collation doesn't
match
Insist that the collation used for grouping matches exactly with the
collation used for partitioning to allow using either full or
partial partitionwise grouping.
Bug: #18568
Reported-by: Webbo Han <1105066510@qq.com>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/plan/planner.c | 66 +++++++++++---
.../regress/expected/collate.icu.utf8.out | 86 +++++++++++++++++++
.../regress/expected/partition_aggregate.out | 11 ---
src/test/regress/sql/collate.icu.utf8.sql | 37 ++++++++
src/test/regress/sql/partition_aggregate.sql | 1 -
5 files changed, 175 insertions(+), 26 deletions(-)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..4044b13071 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -253,7 +253,8 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
GroupPathExtraData *extra);
static bool group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause);
+ List *groupClause,
+ bool *collation_incompatible);
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
@@ -4090,23 +4091,30 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
IS_PARTITIONED_REL(input_rel))
{
+ bool collation_incompatible = false;
+ bool group_by_contains_partkey =
+ group_by_has_partkey(input_rel, extra->targetList,
+ root->parse->groupClause,
+ &collation_incompatible);
+
/*
* If this is the topmost relation or if the parent relation is doing
* full partitionwise aggregation, then we can do full partitionwise
* aggregation provided that the GROUP BY clause contains all of the
- * partitioning columns at this level. Otherwise, we can do at most
- * partial partitionwise aggregation. But if partial aggregation is
- * not supported in general then we can't use it for partitionwise
- * aggregation either.
+ * partitioning columns at this level and the collation used by GROUP
+ * BY matches the partitioning collation. Otherwise, we can do at most
+ * partial partitionwise aggregation, but again only if the collation
+ * is compatible. If partial aggregation is not supported in general
+ * then we can't use it for partitionwise aggregation either.
*
* Check parse->groupClause not processed_groupClause, because it's
* okay if some of the partitioning columns were proved redundant.
*/
if (extra->patype == PARTITIONWISE_AGGREGATE_FULL &&
- group_by_has_partkey(input_rel, extra->targetList,
- root->parse->groupClause))
+ group_by_contains_partkey)
patype = PARTITIONWISE_AGGREGATE_FULL;
- else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
+ else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0 &&
+ !collation_incompatible)
patype = PARTITIONWISE_AGGREGATE_PARTIAL;
else
patype = PARTITIONWISE_AGGREGATE_NONE;
@@ -8105,13 +8113,18 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
/*
* group_by_has_partkey
*
- * Returns true, if all the partition keys of the given relation are part of
- * the GROUP BY clauses, false otherwise.
+ * Returns true if all the partition keys of the given relation are part of
+ * the GROUP BY clauses, including having matching collation, false otherwise.
+ *
+ * Returns false also if a collation mismatch is detected between a partition
+ * key and its corresponding expression in groupClause, in which case.
+ * *collation_incompatible is set to true.
*/
static bool
group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause)
+ List *groupClause,
+ bool *collation_incompatible)
{
List *groupexprs = get_sortgrouplist_exprs(groupClause, targetList);
int cnt = 0;
@@ -8134,13 +8147,38 @@ group_by_has_partkey(RelOptInfo *input_rel,
foreach(lc, partexprs)
{
+ ListCell *lg;
Expr *partexpr = lfirst(lc);
+ Oid partcoll = input_rel->part_scheme->partcollation[cnt];
- if (list_member(groupexprs, partexpr))
+ foreach(lg, groupexprs)
{
- found = true;
- break;
+ Expr *groupexpr = lfirst(lg);
+ Oid groupcoll = exprCollation((Node *) groupexpr);
+
+ if (IsA(groupexpr, RelabelType))
+ groupexpr = ((RelabelType *) groupexpr)->arg;
+
+ /*
+ * Ensure that the grouping collation is compatible with the
+ * partitioning collation.
+ */
+ if (OidIsValid(partcoll) && OidIsValid(groupcoll) &&
+ partcoll != groupcoll)
+ {
+ *collation_incompatible = true;
+ return false;
+ }
+
+ if (equal(groupexpr, partexpr))
+ {
+ found = true;
+ break;
+ }
}
+
+ if (found)
+ break;
}
/*
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index faa376e060..737cf363a2 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2054,6 +2054,92 @@ SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
t
(1 row)
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+CREATE TABLE pagg_tab3 (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT substr('abAB', (i % 4) +1 , 1) FROM generate_series(0, 7) i;
+ANALYZE pagg_tab3;
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 4
+ B | 4
+(2 rows)
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 4
+ B | 4
+(2 rows)
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------
+ Sort
+ Sort Key: ((pagg_tab3.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (pagg_tab3.c)::text
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> HashAggregate
+ Group Key: (pagg_tab3_1.c)::text
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_1
+(9 rows)
+
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 2
+ B | 2
+ a | 2
+ b | 2
+(4 rows)
+
+DROP TABLE pagg_tab3;
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
SET client_min_messages TO warning;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf578..670eb98906 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -1507,14 +1507,3 @@ SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) <
-> Seq Scan on pagg_tab_para_p3 pagg_tab_para_2
(15 rows)
-SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
- x | sum | avg | count
-----+------+--------------------+-------
- 0 | 5000 | 5.0000000000000000 | 1000
- 1 | 6000 | 6.0000000000000000 | 1000
- 10 | 5000 | 5.0000000000000000 | 1000
- 11 | 6000 | 6.0000000000000000 | 1000
- 20 | 5000 | 5.0000000000000000 | 1000
- 21 | 6000 | 6.0000000000000000 | 1000
-(6 rows)
-
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 80f28a97d7..ca1bb7c1f2 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -796,6 +796,43 @@ INSERT INTO test33 VALUES (2, 'DEF');
-- they end up in the same partition (but it's platform-dependent which one)
SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+
+CREATE TABLE pagg_tab3 (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT substr('abAB', (i % 4) +1 , 1) FROM generate_series(0, 7) i;
+ANALYZE pagg_tab3;
+
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+
+DROP TABLE pagg_tab3;
+
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee24..1e263c1caf 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -333,4 +333,3 @@ RESET parallel_setup_cost;
EXPLAIN (COSTS OFF)
SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
-SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
--
2.43.0
Hi,
On Wed, Oct 23, 2024 at 10:48 PM Tender Wang <tndrwang@gmail.com> wrote:
I think the root cause of this thread and [1] are same. We don't use the Partition Key collation but column's
collation to fill the RelOptInfo partexprs field in set_baserel_partition_key_exprs().
If the Partition Key definition is same as column definition, which most times is,
that will be ok. But if it's not, this thread issue will arise.As far as I know, only partition pruning logic considers not only call equal(), but also check collation match.
Other codes only call equal() to check if the exprs match the partition key.
For example, in this thread case, match_expr_to_partition_keys() think the expr match the partition key:
if (equal(lfirst(lc), expr))
return cnt;Although We can fix this issue like [1], I think why not directly use the partkey->partcollation[cnt], which its value is
same with pg_partitioned_table's partcollation. I tried this to fix [1], but at that time, I was unsure if it was the correct fix.
I think it would be better to keep RelOptInfo.partexprs unchanged in
these fixes. I haven't been able to come up with a way to "assign"
the correct collation to partition keys that are not simple column
references. Checking PartitionScheme.partcollation separately is
enough to fix these bugs and aligns with the style of existing code,
such as match_clause_to_partition_key() in partprune.c.
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年11月1日周五 15:27写道:
Hi,
On Wed, Oct 23, 2024 at 10:48 PM Tender Wang <tndrwang@gmail.com> wrote:
I think the root cause of this thread and [1] are same. We don't use
the Partition Key collation but column's
collation to fill the RelOptInfo partexprs field in
set_baserel_partition_key_exprs().
If the Partition Key definition is same as column definition, which
most times is,
that will be ok. But if it's not, this thread issue will arise.
As far as I know, only partition pruning logic considers not only call
equal(), but also check collation match.
Other codes only call equal() to check if the exprs match the partition
key.
For example, in this thread case, match_expr_to_partition_keys() think
the expr match the partition key:
if (equal(lfirst(lc), expr))
return cnt;Although We can fix this issue like [1], I think why not directly use
the partkey->partcollation[cnt], which its value is
same with pg_partitioned_table's partcollation. I tried this to fix [1],
but at that time, I was unsure if it was the correct fix.
I think it would be better to keep RelOptInfo.partexprs unchanged in
these fixes. I haven't been able to come up with a way to "assign"
the correct collation to partition keys that are not simple column
references. Checking PartitionScheme.partcollation separately is
enough to fix these bugs and aligns with the style of existing code,
such as match_clause_to_partition_key() in partprune.c.
Agree. I can't figure out another solution.
--
Thanks,
Tender Wang
just a quick reply while testing v4-0001.
tests copy from src/test/regress/sql/partition_aggregate.sql first 40 lines.
drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20), (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;
EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate
Group Key: pagg_tab.a
-> Append
-> Partial HashAggregate
Group Key: pagg_tab.a
-> Seq Scan on pagg_tab_p1 pagg_tab
-> Partial HashAggregate
Group Key: pagg_tab_1.a
-> Seq Scan on pagg_tab_p2 pagg_tab_1
-> Partial HashAggregate
Group Key: pagg_tab_2.a
-> Seq Scan on pagg_tab_p3 pagg_tab_2
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'
drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a text, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20)::text, (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;
EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate
Group Key: pagg_tab.a
-> Append
-> Seq Scan on pagg_tab_p1 pagg_tab_1
-> Seq Scan on pagg_tab_p2 pagg_tab_2
-> Seq Scan on pagg_tab_p3 pagg_tab_3
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'
it seems "PARTITION BY LIST(c collate "C");" collation compare with
"GROUP BY a;".
set collation_incompatible returned true.
make it cannot do PARTITIONWISE_AGGREGATE_PARTIAL.
but here "group by a", "a" is text data type, we can still do
PARTITIONWISE_AGGREGATE_PARTIAL
?
On Fri, Nov 1, 2024 at 5:08 PM jian he <jian.universality@gmail.com> wrote:
just a quick reply while testing v4-0001.
tests copy from src/test/regress/sql/partition_aggregate.sql first 40 lines.drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20), (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate
Group Key: pagg_tab.a
-> Append
-> Partial HashAggregate
Group Key: pagg_tab.a
-> Seq Scan on pagg_tab_p1 pagg_tab
-> Partial HashAggregate
Group Key: pagg_tab_1.a
-> Seq Scan on pagg_tab_p2 pagg_tab_1
-> Partial HashAggregate
Group Key: pagg_tab_2.a
-> Seq Scan on pagg_tab_p3 pagg_tab_2
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a text, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20)::text, (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;
EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate
Group Key: pagg_tab.a
-> Append
-> Seq Scan on pagg_tab_p1 pagg_tab_1
-> Seq Scan on pagg_tab_p2 pagg_tab_2
-> Seq Scan on pagg_tab_p3 pagg_tab_3
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'it seems "PARTITION BY LIST(c collate "C");" collation compare with
"GROUP BY a;".
set collation_incompatible returned true.
make it cannot do PARTITIONWISE_AGGREGATE_PARTIAL.but here "group by a", "a" is text data type, we can still do
PARTITIONWISE_AGGREGATE_PARTIAL
?
Good catch. Looks like I added a bug in group_by_has_partkey() --
collation_incompatible should be set only when a grouping expression
matches a partition key.
--
Thanks, Amit Langote
Attachments:
v5-0002-Disallow-partitionwise-join-when-collation-doesn-.patchapplication/octet-stream; name=v5-0002-Disallow-partitionwise-join-when-collation-doesn-.patchDownload
From f3bd854e7ee04653a162c8980171fb4591813901 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Nov 2024 15:32:33 +0900
Subject: [PATCH v5 2/2] Disallow partitionwise join when collation doesn't
match
Insist that the collation used for joining matches exactly with the
collation used for partitioning to allow using partitionwise join.
Reported-by: Tender Wang <tndrwang@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/util/relnode.c | 28 +++-
.../regress/expected/collate.icu.utf8.out | 154 ++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 35 ++++
3 files changed, 215 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..489fe6c26f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2185,6 +2185,10 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (pk_known_equal[ipk1])
continue;
+ /* Reject if the partition key collation differs from the clause's. */
+ if (rel1->part_scheme->partcollation[ipk1] != opexpr->inputcollid)
+ return false;
+
/*
* The clause allows partitionwise join only if it uses the same
* operator family as that specified by the partition key.
@@ -2258,6 +2262,8 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
{
Node *expr1 = (Node *) lfirst(lc);
ListCell *lc2;
+ Oid partcoll1 = rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll1 = exprCollation(expr1);
foreach(lc2, rel2->partexprs[ipk])
{
@@ -2265,8 +2271,26 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
- pk_known_equal[ipk] = true;
- break;
+ /*
+ * Ensure that the collation of the expression matches
+ * that of the partition key. Checking just one collation
+ * (partcoll1 and exprcoll1) suffices because partcoll1
+ * and partcoll2, as well as exprcoll1 and exprcoll2,
+ * should be identical. This holds because both rel1 and
+ * rel2 use the same PartitionScheme and expr1 and expr2
+ * are equal.
+ */
+ if (partcoll1 == exprcoll1)
+ {
+ Oid partcoll2 PG_USED_FOR_ASSERTS_ONLY =
+ rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll2 PG_USED_FOR_ASSERTS_ONLY =
+ exprCollation(expr2);
+
+ Assert(partcoll2 == exprcoll2);
+ pk_known_equal[ipk] = true;
+ break;
+ }
}
}
if (pk_known_equal[ipk])
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 1ec9a2548b..e6af9b405f 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2164,7 +2164,161 @@ SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
4 | 3
(4 rows)
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key collation.
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 36
+ B | 36
+(2 rows)
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 36
+ B | 36
+(2 rows)
+
+-- OK when the join clause uses the same collation as the partition key.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+ QUERY PLAN
+------------------------------------------------------------------
+ Sort
+ Sort Key: ((t1.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (t1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1.c)::text = (t2.c)::text)
+ -> Seq Scan on pagg_tab3_p2 t1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p2 t2
+ -> HashAggregate
+ Group Key: (t1_1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+ -> Seq Scan on pagg_tab3_p1 t1_1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p1 t2_1
+(17 rows)
+
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 9
+ B | 9
+ a | 9
+ b | 9
+(4 rows)
+
+-- Another case where the partition keys are matched via equivalence class,
+-- not a join restriction clause.
+CREATE TABLE pagg_tab4 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST(b collate "C");
+CREATE TABLE pagg_tab4_p1 PARTITION OF pagg_tab4 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab4;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab4_p2 t2_1
+ Filter: (c = b)
+ -> Seq Scan on pagg_tab4_p1 t2_2
+ Filter: (c = b)
+(15 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 36
+ B | 36
+(2 rows)
+
+-- OK when the join clause uses the same collation as the partition key
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+ QUERY PLAN
+------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: t1.c
+ -> Append
+ -> Hash Join
+ Hash Cond: ((t1_1.c = t2_1.c) AND ((t1_1.c)::text = (t2_1.b)::text))
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Hash
+ -> Seq Scan on pagg_tab4_p2 t2_1
+ -> Hash Join
+ Hash Cond: ((t1_2.c = t2_2.c) AND ((t1_2.c)::text = (t2_2.b)::text))
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Seq Scan on pagg_tab4_p1 t2_2
+(15 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+ c | count
+---+-------
+ A | 18
+ B | 18
+(2 rows)
+
DROP TABLE pagg_tab3;
+DROP TABLE pagg_tab4;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 792bc94361..407e5e5b21 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -834,7 +834,42 @@ EXPLAIN (COSTS OFF)
SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key collation.
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+
+-- OK when the join clause uses the same collation as the partition key.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY 1;
+
+-- Another case where the partition keys are matched via equivalence class,
+-- not a join restriction clause.
+CREATE TABLE pagg_tab4 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST(b collate "C");
+CREATE TABLE pagg_tab4_p1 PARTITION OF pagg_tab4 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab4;
+
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+
+-- OK when the join clause uses the same collation as the partition key
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+
DROP TABLE pagg_tab3;
+DROP TABLE pagg_tab4;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
--
2.43.0
v5-0001-Disallow-partitionwise-grouping-when-collation-do.patchapplication/octet-stream; name=v5-0001-Disallow-partitionwise-grouping-when-collation-do.patchDownload
From 9ac55edd5118bed524c1671421cb81fa705e9920 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Nov 2024 16:15:50 +0900
Subject: [PATCH v5 1/2] Disallow partitionwise grouping when collation doesn't
match
Insist that the collation used for grouping matches exactly with the
collation used for partitioning to allow using either full or
partial partitionwise grouping.
Bug: #18568
Reported-by: Webbo Han <1105066510@qq.com>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/plan/planner.c | 67 +++++++---
.../regress/expected/collate.icu.utf8.out | 114 ++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 43 +++++++
3 files changed, 210 insertions(+), 14 deletions(-)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..24a3ff087e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -253,7 +253,8 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
GroupPathExtraData *extra);
static bool group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause);
+ List *groupClause,
+ bool *collation_incompatible);
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
@@ -4090,23 +4091,30 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
IS_PARTITIONED_REL(input_rel))
{
+ bool collation_incompatible = false;
+ bool group_by_contains_partkey =
+ group_by_has_partkey(input_rel, extra->targetList,
+ root->parse->groupClause,
+ &collation_incompatible);
+
/*
* If this is the topmost relation or if the parent relation is doing
* full partitionwise aggregation, then we can do full partitionwise
* aggregation provided that the GROUP BY clause contains all of the
- * partitioning columns at this level. Otherwise, we can do at most
- * partial partitionwise aggregation. But if partial aggregation is
- * not supported in general then we can't use it for partitionwise
- * aggregation either.
+ * partitioning columns at this level and the collation used by GROUP
+ * BY matches the partitioning collation. Otherwise, we can do at most
+ * partial partitionwise aggregation, but again only if the collation
+ * is compatible. If partial aggregation is not supported in general
+ * then we can't use it for partitionwise aggregation either.
*
* Check parse->groupClause not processed_groupClause, because it's
* okay if some of the partitioning columns were proved redundant.
*/
if (extra->patype == PARTITIONWISE_AGGREGATE_FULL &&
- group_by_has_partkey(input_rel, extra->targetList,
- root->parse->groupClause))
+ group_by_contains_partkey)
patype = PARTITIONWISE_AGGREGATE_FULL;
- else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
+ else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0 &&
+ !collation_incompatible)
patype = PARTITIONWISE_AGGREGATE_PARTIAL;
else
patype = PARTITIONWISE_AGGREGATE_NONE;
@@ -8105,13 +8113,18 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
/*
* group_by_has_partkey
*
- * Returns true, if all the partition keys of the given relation are part of
- * the GROUP BY clauses, false otherwise.
+ * Returns true if all the partition keys of the given relation are part of
+ * the GROUP BY clauses, including having matching collation, false otherwise.
+ *
+ * Returns false also if a collation mismatch is detected between a partition
+ * key and its corresponding expression in groupClause, in which case.
+ * *collation_incompatible is set to true.
*/
static bool
group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause)
+ List *groupClause,
+ bool *collation_incompatible)
{
List *groupexprs = get_sortgrouplist_exprs(groupClause, targetList);
int cnt = 0;
@@ -8134,13 +8147,39 @@ group_by_has_partkey(RelOptInfo *input_rel,
foreach(lc, partexprs)
{
+ ListCell *lg;
Expr *partexpr = lfirst(lc);
+ Oid partcoll = input_rel->part_scheme->partcollation[cnt];
- if (list_member(groupexprs, partexpr))
+ foreach(lg, groupexprs)
{
- found = true;
- break;
+ Expr *groupexpr = lfirst(lg);
+ Oid groupcoll = exprCollation((Node *) groupexpr);
+
+ if (IsA(groupexpr, RelabelType))
+ groupexpr = ((RelabelType *) groupexpr)->arg;
+
+ if (equal(groupexpr, partexpr))
+ {
+
+ /*
+ * Reject a match if the grouping collation does not match
+ * the partitioning collation.
+ */
+ if (OidIsValid(partcoll) && OidIsValid(groupcoll) &&
+ partcoll != groupcoll)
+ {
+ *collation_incompatible = true;
+ return false;
+ }
+
+ found = true;
+ break;
+ }
}
+
+ if (found)
+ break;
}
/*
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index faa376e060..1ec9a2548b 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2054,6 +2054,120 @@ SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
t
(1 row)
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+CREATE TABLE pagg_tab3 (a text, c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT i % 4 + 1, substr('abAB', (i % 4) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab3;
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 6
+ B | 6
+(2 rows)
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 6
+ B | 6
+(2 rows)
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------
+ Sort
+ Sort Key: ((pagg_tab3.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (pagg_tab3.c)::text
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> HashAggregate
+ Group Key: (pagg_tab3_1.c)::text
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_1
+(9 rows)
+
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 3
+ B | 3
+ a | 3
+ b | 3
+(4 rows)
+
+-- Also OK to use partial partitionwise aggregate when grouping columns do not
+-- include partition key columns
+EXPLAIN (COSTS OFF)
+SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------------
+ Finalize GroupAggregate
+ Group Key: pagg_tab3.a
+ -> Sort
+ Sort Key: pagg_tab3.a
+ -> Append
+ -> Partial HashAggregate
+ Group Key: pagg_tab3.a
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> Partial HashAggregate
+ Group Key: pagg_tab3_1.a
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_1
+(11 rows)
+
+SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
+ a | count
+---+-------
+ 1 | 3
+ 2 | 3
+ 3 | 3
+ 4 | 3
+(4 rows)
+
+DROP TABLE pagg_tab3;
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 80f28a97d7..792bc94361 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -796,6 +796,49 @@ INSERT INTO test33 VALUES (2, 'DEF');
-- they end up in the same partition (but it's platform-dependent which one)
SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+
+CREATE TABLE pagg_tab3 (a text, c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT i % 4 + 1, substr('abAB', (i % 4) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab3;
+
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+
+-- Also OK to use partial partitionwise aggregate when grouping columns do not
+-- include partition key columns
+EXPLAIN (COSTS OFF)
+SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
+SELECT a, count(c collate "C") FROM pagg_tab3 GROUP BY a ORDER BY 1;
+
+DROP TABLE pagg_tab3;
+
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
--
2.43.0
hi.
about v5.
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
/*
* Ensure that the collation of the expression matches
* that of the partition key. Checking just one collation
* (partcoll1 and exprcoll1) suffices because partcoll1
* and partcoll2, as well as exprcoll1 and exprcoll2,
* should be identical. This holds because both rel1 and
* rel2 use the same PartitionScheme and expr1 and expr2
* are equal.
*/
if (partcoll1 == exprcoll1)
{
Oid partcoll2 PG_USED_FOR_ASSERTS_ONLY =
rel1->part_scheme->partcollation[ipk];
Oid exprcoll2 PG_USED_FOR_ASSERTS_ONLY =
exprCollation(expr2);
Assert(partcoll2 == exprcoll2);
pk_known_equal[ipk] = true;
if (OidIsValid(exprcoll1))
elog(INFO, "this path called %s:%d",
__FILE_NAME__, __LINE__);
break;
}
}
tests still passed, which means that we didn't have text data type as
partition key related tests for partition-wise join.
Do we need to add one?
+-- Another case where the partition keys are matched via equivalence class,
+-- not a join restriction clause.
+
+-- OK when the join clause uses the same collation as the partition key
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c
= t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
i suppose, you comments is saying that in have_partkey_equi_join
the above query will return true via
`if (exprs_known_equal(root, expr1, expr2, btree_opfamily))`
But " t1.c = t2.b COLLATE "C" already in "restrictlist".
In have_partkey_equi_join loop through "restrictlist" would return
true for above query, won't reach exprs_known_equal.
Other than the comments that confused me, the test and the results
look fine to me.
some column collation is case_insensitive, ORDER BY that column would
render the output not deterministic.
like 'A' before 'a' and 'a' before 'A' are both correct.
it may cause regress tests to fail.
So I did some minor refactoring to make the "ORDER BY" deterministic.
Attachments:
v5-0001-make-partition-wise-partitoin-aggreagte-relate.no-cfbotapplication/octet-stream; name=v5-0001-make-partition-wise-partitoin-aggreagte-relate.no-cfbotDownload
From 1c02e15f99bb9981035760ccaf65f91975e6a088 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 2 Nov 2024 15:40:55 +0800
Subject: [PATCH v5 1/1] make partition-wise, partitoin-aggreagte related test
deterministic
---
src/test/regress/expected/collate.icu.utf8.out | 18 +++++++++---------
src/test/regress/sql/collate.icu.utf8.sql | 12 ++++++------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index e6af9b405f..b2950a0634 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2195,11 +2195,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY t1.c ORDER BY t1.c collate "C";
QUERY PLAN
-------------------------------------------------------------
Sort
- Sort Key: t1.c COLLATE case_insensitive
+ Sort Key: t1.c COLLATE "C"
-> HashAggregate
Group Key: t1.c
-> Hash Join
@@ -2213,7 +2213,7 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p1 t2_2
(13 rows)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY t1.c ORDER BY t1.c collate "C";
c | count
---+-------
A | 36
@@ -2261,11 +2261,11 @@ CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 11) i;
ANALYZE pagg_tab4;
EXPLAIN (COSTS OFF)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY t1.c ORDER BY t1.c collate "C";
QUERY PLAN
-------------------------------------------------------------
Sort
- Sort Key: t1.c COLLATE case_insensitive
+ Sort Key: t1.c COLLATE "C"
-> HashAggregate
Group Key: t1.c
-> Hash Join
@@ -2281,7 +2281,7 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND
Filter: (c = b)
(15 rows)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY t1.c ORDER BY t1.c collate "C";
c | count
---+-------
A | 36
@@ -2290,11 +2290,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND
-- OK when the join clause uses the same collation as the partition key
EXPLAIN (COSTS OFF)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY t1.c ORDER BY t1.c collate "C";
QUERY PLAN
------------------------------------------------------------------------------------------
Sort
- Sort Key: t1.c COLLATE case_insensitive
+ Sort Key: t1.c COLLATE "C"
-> HashAggregate
Group Key: t1.c
-> Append
@@ -2310,7 +2310,7 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND
-> Seq Scan on pagg_tab4_p1 t2_2
(15 rows)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY t1.c ORDER BY t1.c collate "C";
c | count
---+-------
A | 18
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 407e5e5b21..beae6603b1 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -843,8 +843,8 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY t1.c ORDER BY t1.c collate "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY t1.c ORDER BY t1.c collate "C";
-- OK when the join clause uses the same collation as the partition key.
EXPLAIN (COSTS OFF)
@@ -860,13 +860,13 @@ INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abA
ANALYZE pagg_tab4;
EXPLAIN (COSTS OFF)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY t1.c ORDER BY t1.c collate "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY t1.c ORDER BY t1.c collate "C";
-- OK when the join clause uses the same collation as the partition key
EXPLAIN (COSTS OFF)
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
-SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY t1.c ORDER BY t1.c collate "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY t1.c ORDER BY t1.c collate "C";
DROP TABLE pagg_tab3;
DROP TABLE pagg_tab4;
--
2.34.1
Hi,
On Mon, Nov 4, 2024 at 12:28 PM jian he <jian.universality@gmail.com> wrote:
hi.
about v5.
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
/*
* Ensure that the collation of the expression matches
* that of the partition key. Checking just one collation
* (partcoll1 and exprcoll1) suffices because partcoll1
* and partcoll2, as well as exprcoll1 and exprcoll2,
* should be identical. This holds because both rel1 and
* rel2 use the same PartitionScheme and expr1 and expr2
* are equal.
*/
if (partcoll1 == exprcoll1)
{
Oid partcoll2 PG_USED_FOR_ASSERTS_ONLY =
rel1->part_scheme->partcollation[ipk];
Oid exprcoll2 PG_USED_FOR_ASSERTS_ONLY =
exprCollation(expr2);
Assert(partcoll2 == exprcoll2);
pk_known_equal[ipk] = true;
if (OidIsValid(exprcoll1))
elog(INFO, "this path called %s:%d",
__FILE_NAME__, __LINE__);
break;
}
}tests still passed, which means that we didn't have text data type as
partition key related tests for partition-wise join.
Do we need to add one?+-- Another case where the partition keys are matched via equivalence class, +-- not a join restriction clause. + +-- OK when the join clause uses the same collation as the partition key +EXPLAIN (COSTS OFF) +SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b COLLATE "C" GROUP BY 1 ORDER BY 1;i suppose, you comments is saying that in have_partkey_equi_join
the above query will return true via
`if (exprs_known_equal(root, expr1, expr2, btree_opfamily))`
But " t1.c = t2.b COLLATE "C" already in "restrictlist".
In have_partkey_equi_join loop through "restrictlist" would return
true for above query, won't reach exprs_known_equal.Other than the comments that confused me, the test and the results
look fine to me.
Thanks, yes, a test case that exercises the partcoll1 == exprcoll1
code was missing.
some column collation is case_insensitive, ORDER BY that column would
render the output not deterministic.
like 'A' before 'a' and 'a' before 'A' are both correct.
it may cause regress tests to fail.
So I did some minor refactoring to make the "ORDER BY" deterministic.
Thanks, merged.
--
Thanks, Amit Langote
Attachments:
v6-0002-Disallow-partitionwise-join-when-collation-doesn-.patchapplication/octet-stream; name=v6-0002-Disallow-partitionwise-join-when-collation-doesn-.patchDownload
From 36274d54a2367ef2c249ac4c4d2762387d36175e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Nov 2024 15:32:33 +0900
Subject: [PATCH v6 2/2] Disallow partitionwise join when collation doesn't
match
Insist that the collation used for joining matches exactly with the
collation used for partitioning to allow using partitionwise join.
Reported-by: Tender Wang <tndrwang@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/util/relnode.c | 28 ++-
.../regress/expected/collate.icu.utf8.out | 209 ++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 58 +++++
3 files changed, 293 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..489fe6c26f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2185,6 +2185,10 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (pk_known_equal[ipk1])
continue;
+ /* Reject if the partition key collation differs from the clause's. */
+ if (rel1->part_scheme->partcollation[ipk1] != opexpr->inputcollid)
+ return false;
+
/*
* The clause allows partitionwise join only if it uses the same
* operator family as that specified by the partition key.
@@ -2258,6 +2262,8 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
{
Node *expr1 = (Node *) lfirst(lc);
ListCell *lc2;
+ Oid partcoll1 = rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll1 = exprCollation(expr1);
foreach(lc2, rel2->partexprs[ipk])
{
@@ -2265,8 +2271,26 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
- pk_known_equal[ipk] = true;
- break;
+ /*
+ * Ensure that the collation of the expression matches
+ * that of the partition key. Checking just one collation
+ * (partcoll1 and exprcoll1) suffices because partcoll1
+ * and partcoll2, as well as exprcoll1 and exprcoll2,
+ * should be identical. This holds because both rel1 and
+ * rel2 use the same PartitionScheme and expr1 and expr2
+ * are equal.
+ */
+ if (partcoll1 == exprcoll1)
+ {
+ Oid partcoll2 PG_USED_FOR_ASSERTS_ONLY =
+ rel1->part_scheme->partcollation[ipk];
+ Oid exprcoll2 PG_USED_FOR_ASSERTS_ONLY =
+ exprCollation(expr2);
+
+ Assert(partcoll2 == exprcoll2);
+ pk_known_equal[ipk] = true;
+ break;
+ }
}
}
if (pk_known_equal[ipk])
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index a4cd9ea085..e025c2518c 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2164,7 +2164,216 @@ SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C"
4 | 3
(4 rows)
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key collation.
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE "C"
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ c | count
+---+-------
+ A | 36
+ B | 36
+(2 rows)
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE "C"
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ c | count
+---+-------
+ A | 36
+ B | 36
+(2 rows)
+
+-- OK when the join clause uses the same collation as the partition key.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY t1.c COLLATE "C";
+ QUERY PLAN
+------------------------------------------------------------------
+ Sort
+ Sort Key: ((t1.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (t1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1.c)::text = (t2.c)::text)
+ -> Seq Scan on pagg_tab3_p2 t1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p2 t2
+ -> HashAggregate
+ Group Key: (t1_1.c)::text
+ -> Hash Join
+ Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+ -> Seq Scan on pagg_tab3_p1 t1_1
+ -> Hash
+ -> Seq Scan on pagg_tab3_p1 t2_1
+(17 rows)
+
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY t1.c COLLATE "C";
+ c | count
+---+-------
+ A | 9
+ B | 9
+ a | 9
+ b | 9
+(4 rows)
+
+-- Few other cases where the joined partition keys are matched via equivalence
+-- class, not a join restriction clause.
+-- Collations of joined columns match, but the partition keys collation is different
+CREATE TABLE pagg_tab4 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST (b collate "C");
+CREATE TABLE pagg_tab4_p1 PARTITION OF pagg_tab4 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab4;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE "C"
+ -> HashAggregate
+ Group Key: t1.c
+ -> Hash Join
+ Hash Cond: (t1.c = t2.c)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t1_1
+ -> Seq Scan on pagg_tab3_p1 t1_2
+ -> Hash
+ -> Append
+ -> Seq Scan on pagg_tab4_p2 t2_1
+ Filter: (c = b)
+ -> Seq Scan on pagg_tab4_p1 t2_2
+ Filter: (c = b)
+(15 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ c | count
+---+-------
+ A | 36
+ B | 36
+(2 rows)
+
+-- OK when the partition key collation is same as that of the join columns
+CREATE TABLE pagg_tab5 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST (c collate case_insensitive);
+CREATE TABLE pagg_tab5_p1 PARTITION OF pagg_tab5 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab5_p2 PARTITION OF pagg_tab5 FOR VALUES IN ('c', 'd');
+INSERT INTO pagg_tab5 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+INSERT INTO pagg_tab5 (b, c) SELECT substr('cdCD', (i % 4) + 1 , 1), substr('cdCD', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+ANALYZE pagg_tab5;
+CREATE TABLE pagg_tab6 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST (b collate case_insensitive);
+CREATE TABLE pagg_tab6_p1 PARTITION OF pagg_tab6 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab6_p2 PARTITION OF pagg_tab6 FOR VALUES IN ('c', 'd');
+INSERT INTO pagg_tab6 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+INSERT INTO pagg_tab6 (b, c) SELECT substr('cdCD', (i % 4) + 1 , 1), substr('cdCD', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+ANALYZE pagg_tab5;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE "C"
+ -> GroupAggregate
+ Group Key: t1.c
+ -> Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> Append
+ -> Hash Join
+ Hash Cond: (t2_1.c = t1_1.c)
+ -> Seq Scan on pagg_tab6_p1 t2_1
+ Filter: (c = b)
+ -> Hash
+ -> Seq Scan on pagg_tab5_p1 t1_1
+ -> Hash Join
+ Hash Cond: (t2_2.c = t1_2.c)
+ -> Seq Scan on pagg_tab6_p2 t2_2
+ Filter: (c = b)
+ -> Hash
+ -> Seq Scan on pagg_tab5_p2 t1_2
+(19 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ c | count
+---+-------
+ a | 9
+ b | 9
+ c | 9
+ d | 9
+(4 rows)
+
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ QUERY PLAN
+-------------------------------------------------------------
+ Sort
+ Sort Key: t1.c COLLATE "C"
+ -> GroupAggregate
+ Group Key: t1.c
+ -> Merge Join
+ Merge Cond: (t2.c = t1.c)
+ -> Sort
+ Sort Key: t2.c COLLATE case_insensitive
+ -> Append
+ -> Seq Scan on pagg_tab6_p1 t2_1
+ Filter: (c = b)
+ -> Seq Scan on pagg_tab6_p2 t2_2
+ Filter: (c = b)
+ -> Sort
+ Sort Key: t1.c COLLATE case_insensitive
+ -> Append
+ -> Seq Scan on pagg_tab5_p1 t1_1
+ -> Seq Scan on pagg_tab5_p2 t1_2
+(18 rows)
+
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+ c | count
+---+-------
+ a | 9
+ b | 9
+ c | 9
+ d | 9
+(4 rows)
+
DROP TABLE pagg_tab3;
+DROP TABLE pagg_tab4;
+DROP TABLE pagg_tab5;
+DROP TABLE pagg_tab6;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index f523dd73be..bad92b2ab7 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -834,7 +834,65 @@ EXPLAIN (COSTS OFF)
SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C" ORDER BY 1;
SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C" ORDER BY 1;
+-- Partitionwise join should not be allowed too when the collation used by the
+-- join keys doesn't match the partition key collation.
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+
+SET enable_partitionwise_join TO true;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
+
+-- OK when the join clause uses the same collation as the partition key.
+EXPLAIN (COSTS OFF)
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY t1.c COLLATE "C";
+SELECT t1.c COLLATE "C", count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c COLLATE "C" GROUP BY t1.c COLLATE "C" ORDER BY t1.c COLLATE "C";
+
+-- Few other cases where the joined partition keys are matched via equivalence
+-- class, not a join restriction clause.
+
+-- Collations of joined columns match, but the partition keys collation is different
+CREATE TABLE pagg_tab4 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST (b collate "C");
+CREATE TABLE pagg_tab4_p1 PARTITION OF pagg_tab4 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab4_p2 PARTITION OF pagg_tab4 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab4 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab4;
+
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab4 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+
+-- OK when the partition key collation is same as that of the join columns
+CREATE TABLE pagg_tab5 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST (c collate case_insensitive);
+CREATE TABLE pagg_tab5_p1 PARTITION OF pagg_tab5 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab5_p2 PARTITION OF pagg_tab5 FOR VALUES IN ('c', 'd');
+INSERT INTO pagg_tab5 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+INSERT INTO pagg_tab5 (b, c) SELECT substr('cdCD', (i % 4) + 1 , 1), substr('cdCD', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+ANALYZE pagg_tab5;
+
+CREATE TABLE pagg_tab6 (c text collate case_insensitive, b text collate case_insensitive) PARTITION BY LIST (b collate case_insensitive);
+CREATE TABLE pagg_tab6_p1 PARTITION OF pagg_tab6 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab6_p2 PARTITION OF pagg_tab6 FOR VALUES IN ('c', 'd');
+INSERT INTO pagg_tab6 (b, c) SELECT substr('abAB', (i % 4) + 1 , 1), substr('abAB', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+INSERT INTO pagg_tab6 (b, c) SELECT substr('cdCD', (i % 4) + 1 , 1), substr('cdCD', (i % 2) + 1 , 1) FROM generate_series(0, 5) i;
+ANALYZE pagg_tab5;
+
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+
+SET enable_partitionwise_join TO false;
+EXPLAIN (COSTS OFF)
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
+
DROP TABLE pagg_tab3;
+DROP TABLE pagg_tab4;
+DROP TABLE pagg_tab5;
+DROP TABLE pagg_tab6;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
--
2.43.0
v6-0001-Disallow-partitionwise-grouping-when-collation-do.patchapplication/octet-stream; name=v6-0001-Disallow-partitionwise-grouping-when-collation-do.patchDownload
From 6fa7144a91ca21b0ca6bd6fcc34a57ab517359d8 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Nov 2024 16:15:50 +0900
Subject: [PATCH v6 1/2] Disallow partitionwise grouping when collation doesn't
match
Insist that the collation used for grouping matches exactly with the
collation used for partitioning to allow using either full or
partial partitionwise grouping.
Bug: #18568
Reported-by: Webbo Han <1105066510@qq.com>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
---
src/backend/optimizer/plan/planner.c | 67 +++++++---
.../regress/expected/collate.icu.utf8.out | 114 ++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 43 +++++++
3 files changed, 210 insertions(+), 14 deletions(-)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..24a3ff087e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -253,7 +253,8 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
GroupPathExtraData *extra);
static bool group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause);
+ List *groupClause,
+ bool *collation_incompatible);
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
@@ -4090,23 +4091,30 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
IS_PARTITIONED_REL(input_rel))
{
+ bool collation_incompatible = false;
+ bool group_by_contains_partkey =
+ group_by_has_partkey(input_rel, extra->targetList,
+ root->parse->groupClause,
+ &collation_incompatible);
+
/*
* If this is the topmost relation or if the parent relation is doing
* full partitionwise aggregation, then we can do full partitionwise
* aggregation provided that the GROUP BY clause contains all of the
- * partitioning columns at this level. Otherwise, we can do at most
- * partial partitionwise aggregation. But if partial aggregation is
- * not supported in general then we can't use it for partitionwise
- * aggregation either.
+ * partitioning columns at this level and the collation used by GROUP
+ * BY matches the partitioning collation. Otherwise, we can do at most
+ * partial partitionwise aggregation, but again only if the collation
+ * is compatible. If partial aggregation is not supported in general
+ * then we can't use it for partitionwise aggregation either.
*
* Check parse->groupClause not processed_groupClause, because it's
* okay if some of the partitioning columns were proved redundant.
*/
if (extra->patype == PARTITIONWISE_AGGREGATE_FULL &&
- group_by_has_partkey(input_rel, extra->targetList,
- root->parse->groupClause))
+ group_by_contains_partkey)
patype = PARTITIONWISE_AGGREGATE_FULL;
- else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
+ else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0 &&
+ !collation_incompatible)
patype = PARTITIONWISE_AGGREGATE_PARTIAL;
else
patype = PARTITIONWISE_AGGREGATE_NONE;
@@ -8105,13 +8113,18 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
/*
* group_by_has_partkey
*
- * Returns true, if all the partition keys of the given relation are part of
- * the GROUP BY clauses, false otherwise.
+ * Returns true if all the partition keys of the given relation are part of
+ * the GROUP BY clauses, including having matching collation, false otherwise.
+ *
+ * Returns false also if a collation mismatch is detected between a partition
+ * key and its corresponding expression in groupClause, in which case.
+ * *collation_incompatible is set to true.
*/
static bool
group_by_has_partkey(RelOptInfo *input_rel,
List *targetList,
- List *groupClause)
+ List *groupClause,
+ bool *collation_incompatible)
{
List *groupexprs = get_sortgrouplist_exprs(groupClause, targetList);
int cnt = 0;
@@ -8134,13 +8147,39 @@ group_by_has_partkey(RelOptInfo *input_rel,
foreach(lc, partexprs)
{
+ ListCell *lg;
Expr *partexpr = lfirst(lc);
+ Oid partcoll = input_rel->part_scheme->partcollation[cnt];
- if (list_member(groupexprs, partexpr))
+ foreach(lg, groupexprs)
{
- found = true;
- break;
+ Expr *groupexpr = lfirst(lg);
+ Oid groupcoll = exprCollation((Node *) groupexpr);
+
+ if (IsA(groupexpr, RelabelType))
+ groupexpr = ((RelabelType *) groupexpr)->arg;
+
+ if (equal(groupexpr, partexpr))
+ {
+
+ /*
+ * Reject a match if the grouping collation does not match
+ * the partitioning collation.
+ */
+ if (OidIsValid(partcoll) && OidIsValid(groupcoll) &&
+ partcoll != groupcoll)
+ {
+ *collation_incompatible = true;
+ return false;
+ }
+
+ found = true;
+ break;
+ }
}
+
+ if (found)
+ break;
}
/*
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index faa376e060..a4cd9ea085 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2054,6 +2054,120 @@ SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
t
(1 row)
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+CREATE TABLE pagg_tab3 (a text, c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT i % 4 + 1, substr('abAB', (i % 4) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab3;
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 6
+ B | 6
+(2 rows)
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------------------
+ Sort
+ Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
+ -> HashAggregate
+ Group Key: pagg_tab3.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3_1
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_2
+(7 rows)
+
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+ upper | count
+-------+-------
+ A | 6
+ B | 6
+(2 rows)
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------
+ Sort
+ Sort Key: ((pagg_tab3.c)::text) COLLATE "C"
+ -> Append
+ -> HashAggregate
+ Group Key: (pagg_tab3.c)::text
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> HashAggregate
+ Group Key: (pagg_tab3_1.c)::text
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_1
+(9 rows)
+
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+ c | count
+---+-------
+ A | 3
+ B | 3
+ a | 3
+ b | 3
+(4 rows)
+
+-- Also OK to use partial partitionwise aggregate when grouping columns do not
+-- include partition key columns
+EXPLAIN (COSTS OFF)
+SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C" ORDER BY 1;
+ QUERY PLAN
+--------------------------------------------------------------
+ Finalize GroupAggregate
+ Group Key: ((pagg_tab3.a)::text)
+ -> Sort
+ Sort Key: ((pagg_tab3.a)::text) COLLATE "C"
+ -> Append
+ -> Partial HashAggregate
+ Group Key: (pagg_tab3.a)::text
+ -> Seq Scan on pagg_tab3_p2 pagg_tab3
+ -> Partial HashAggregate
+ Group Key: (pagg_tab3_1.a)::text
+ -> Seq Scan on pagg_tab3_p1 pagg_tab3_1
+(11 rows)
+
+SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C" ORDER BY 1;
+ a | count
+---+-------
+ 1 | 3
+ 2 | 3
+ 3 | 3
+ 4 | 3
+(4 rows)
+
+DROP TABLE pagg_tab3;
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 80f28a97d7..f523dd73be 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -796,6 +796,49 @@ INSERT INTO test33 VALUES (2, 'DEF');
-- they end up in the same partition (but it's platform-dependent which one)
SELECT (SELECT count(*) FROM test33_0) <> (SELECT count(*) FROM test33_1);
+--
+-- Bug #18568
+--
+-- Partitionwise aggregate (full or partial) should not be used when a
+-- partition key's collation doesn't match that of the GROUP BY column it is
+-- matched with.
+SET max_parallel_workers_per_gather TO 0;
+SET enable_incremental_sort TO off;
+
+CREATE TABLE pagg_tab3 (a text, c text collate case_insensitive) PARTITION BY LIST(c collate "C");
+CREATE TABLE pagg_tab3_p1 PARTITION OF pagg_tab3 FOR VALUES IN ('a', 'b');
+CREATE TABLE pagg_tab3_p2 PARTITION OF pagg_tab3 FOR VALUES IN ('B', 'A');
+INSERT INTO pagg_tab3 SELECT i % 4 + 1, substr('abAB', (i % 4) + 1 , 1) FROM generate_series(0, 11) i;
+ANALYZE pagg_tab3;
+
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- No partitionwise aggregation allowed!
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c collate case_insensitive ORDER BY 1;
+
+-- OK to use full partitionwise aggregate after changing the GROUP BY column's
+-- collation to be the same as that of the partition key.
+EXPLAIN (COSTS OFF)
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
+
+-- Also OK to use partial partitionwise aggregate when grouping columns do not
+-- include partition key columns
+EXPLAIN (COSTS OFF)
+SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C" ORDER BY 1;
+SELECT a collate "C", count(c collate "C") FROM pagg_tab3 GROUP BY a collate "C" ORDER BY 1;
+
+DROP TABLE pagg_tab3;
+
+RESET enable_partitionwise_aggregate;
+RESET max_parallel_workers_per_gather;
+RESET enable_incremental_sort;
-- cleanup
RESET search_path;
--
2.43.0
looks good to me.
I didn't find any issue.
group_by_has_partkey can even cope with:
EXPLAIN (COSTS OFF, settings)
SELECT c collate "C" collate case_insensitive collate "C", count(c)
FROM pagg_tab3 GROUP BY c collate "C" collate case_insensitive collate
"C" ORDER BY 1;
so i guess in group_by_has_partkey
if (IsA(groupexpr, RelabelType))
groupexpr = ((RelabelType *) groupexpr)->arg;
should be enough.
not need while loop.
while (IsA(groupexpr, RelabelType))
groupexpr = (Expr *) (castNode(RelabelType, groupexpr))->arg;
Hi,
On Wed, Nov 6, 2024 at 12:19 PM jian he <jian.universality@gmail.com> wrote:
looks good to me.
I didn't find any issue.
Thanks for the review.
group_by_has_partkey can even cope with:
EXPLAIN (COSTS OFF, settings)
SELECT c collate "C" collate case_insensitive collate "C", count(c)
FROM pagg_tab3 GROUP BY c collate "C" collate case_insensitive collate
"C" ORDER BY 1;so i guess in group_by_has_partkey
if (IsA(groupexpr, RelabelType))
groupexpr = ((RelabelType *) groupexpr)->arg;
should be enough.not need while loop.
while (IsA(groupexpr, RelabelType))
groupexpr = (Expr *) (castNode(RelabelType, groupexpr))->arg;
Added a comment about that.
Pushed both patches after making changes to 0001 to allow "partial"
partitionwise aggregation after all. The differences in output with
partial partitionwise aggregation and no partitionwise aggregation
that I mentioned before don't seem to have anything to do with
partitionwise aggregation, but apparently with whether aggregation was
hashed or not. I confirmed that by turning enable_hashagg on and off
to see the difference. Changing enable_partitionwise_aggregate for
either of the values of enable_hashagg didn't change the plan.
Thank you all for working on this.
--
Thanks, Amit Langote