Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

tndrwang@gmail.com

over 1 year ago

In reply to: Tender Wang (#1)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道：

Hi all,

I think the root cause of this thread and [1] are same. We don't use the
Partition Key collation but column's
collation to fill the RelOptInfo partexprs field in
set_baserel_partition_key_exprs().
If the Partition Key definition is same as column definition, which most
times is,
that will be ok. But if it's not, this thread issue will arise.

As far as I know, only partition pruning logic considers not only call
equal(), but also check collation match.
Other codes only call equal() to check if the exprs match the partition
key.
For example, in this thread case, match_expr_to_partition_keys() think the
expr match the partition key:
if (equal(lfirst(lc), expr))
return cnt;

Although We can fix this issue like [1], I think why not directly use the
partkey->partcollation[cnt], which its value is
same with pg_partitioned_table's partcollation. I tried this to fix [1],
but at that time, I was unsure if it was the correct fix.

Until I find this thread issue, I think we should do it this way.
In the attached patch, I include this thread test and [1] test case.

Hi
In the last email, I proposed to use partcollation directly. But I ignore
the case that partkey is not a simple column reference.
for expample:
create table coll_pruning_multi (a text) partition by range (substr(a,
1) collate "POSIX", substr(a, 1) collate "C");

The partexprs in RelOptInfo store substr(a,1), a FuncExpr, and its
collation is same with column a not collate "posix".

I'm now thinking how we can use a uniform solution to fix this thread issue
and issue in [1]/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang

[1]: /messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org -- Thanks, Tender Wang
/messages/by-id/18568-2a9afb6b9f7e6ed3@postgresql.org
--
Thanks,
Tender Wang

tndrwang@gmail.com

over 1 year ago

In reply to: Tender Wang (#1)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道：

Hi all,

I find another issue as $SUBJECT when I work on [1].

When I continue to work on this, I find below issue. But I'm not sure
whether it is a bug.

postgres=# create table part_index(a text primary key) partition by list (
a collate "POSIX");
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.
postgres=# create table part_index(a text) partition by list ( a collate
"POSIX");
CREATE TABLE
postgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.

It seems we can't create a primary key if the collation is different
between columnDef and PartitionKey.

By the way, I think the error message is misleading to users.
ostgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all
partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"
which is part of the partition key.

--
Thanks,
Tender Wang

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: Tender Wang (#3)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Hi,

On Thu, Oct 24, 2024 at 1:46 PM Tender Wang <tndrwang@gmail.com> wrote:

Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道：

Hi all,

I find another issue as $SUBJECT when I work on [1].

When I continue to work on this, I find below issue. But I'm not sure whether it is a bug.

postgres=# create table part_index(a text primary key) partition by list ( a collate "POSIX");
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a" which is part of the partition key.
postgres=# create table part_index(a text) partition by list ( a collate "POSIX");
CREATE TABLE
postgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a" which is part of the partition key.

It seems we can't create a primary key if the collation is different between columnDef and PartitionKey.

Yeah, you don't want to have the PK index and the partitioning logic
to not be in sync about the collation rules applied to the individual
rows.

By the way, I think the error message is misleading to users.
ostgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a" which is part of the partition key.

I think it's kind of similar to the message you get when a GROUP BY
column's collation doesn't match the column appearing in the SELECT
list:

explain SELECT c collate case_insensitive, count(c) FROM
pagg_tab_case_s GROUP BY c collate "C";
ERROR: column "pagg_tab_case_s.c" must appear in the GROUP BY clause
or be used in an aggregate function
LINE 1: explain SELECT c collate case_insensitive, count(c) FROM pag...

Perhaps it would be more helpful for the error message or hint or
detail to mention the actual discrepancy (collation mismatch) that's
causing the error.

There might be other instances of such an error and I am not sure it
would be worthwhile to find and fix them all.

--
Thanks, Amit Langote

tndrwang@gmail.com

over 1 year ago

In reply to: Amit Langote (#4)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Amit Langote <amitlangote09@gmail.com> 于2024年10月24日周四 14:33写道：

Hi,

On Thu, Oct 24, 2024 at 1:46 PM Tender Wang <tndrwang@gmail.com> wrote:

Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 21:48写道：

Hi all,

I find another issue as $SUBJECT when I work on [1].

When I continue to work on this, I find below issue. But I'm not sure

whether it is a bug.

postgres=# create table part_index(a text primary key) partition by list

( a collate "POSIX");

ERROR: unique constraint on partitioned table must include all

partitioning columns

DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"

which is part of the partition key.

postgres=# create table part_index(a text) partition by list ( a collate

"POSIX");

CREATE TABLE
postgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all

partitioning columns

DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"

which is part of the partition key.

It seems we can't create a primary key if the collation is different

between columnDef and PartitionKey.

Yeah, you don't want to have the PK index and the partitioning logic
to not be in sync about the collation rules applied to the individual
rows.

By the way, I think the error message is misleading to users.
ostgres=# alter table part_index add primary key (a);
ERROR: unique constraint on partitioned table must include all

partitioning columns

DETAIL: PRIMARY KEY constraint on table "part_index" lacks column "a"

which is part of the partition key.

I think it's kind of similar to the message you get when a GROUP BY
column's collation doesn't match the column appearing in the SELECT
list:

explain SELECT c collate case_insensitive, count(c) FROM
pagg_tab_case_s GROUP BY c collate "C";
ERROR: column "pagg_tab_case_s.c" must appear in the GROUP BY clause
or be used in an aggregate function
LINE 1: explain SELECT c collate case_insensitive, count(c) FROM pag...

Perhaps it would be more helpful for the error message or hint or
detail to mention the actual discrepancy (collation mismatch) that's
causing the error.

There might be other instances of such an error and I am not sure it
would be worthwhile to find and fix them all.

Thanks for the explanation. We had better focus on the wrong result issue.

I feel that it's hard only to use one struct(for example, X), which just
calls equal(X, expr)
can check both the expression match and the collation match.

Maybe we should add another collation match checks in
match_clause_to_partition_key(), like
partition pruning logic does.

Any thoughts?

--
Thanks,
Tender Wang

[1]: https://en.wikipedia.org/wiki/Join_%28SQL%29#Equi-join

jian.universality@gmail.com

over 1 year ago

In reply to: Tender Wang (#5)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

On Thu, Oct 24, 2024 at 3:01 PM Tender Wang <tndrwang@gmail.com> wrote:

I feel that it's hard only to use one struct(for example, X), which just calls equal(X, expr)
can check both the expression match and the collation match.

in RelOptInfo->partexprs, maybe we should mention that the partition
key collation is stored
in RelOptInfo->part_scheme, not here.

Maybe we should add another collation match checks in match_clause_to_partition_key(), like
partition pruning logic does.

in match_clause_to_partition_key
we already have

else if (IsA(clause, OpExpr) &&
list_length(((OpExpr *) clause)->args) == 2)
{
/*
* Partition key match also requires collation match. There may be
* multiple partkeys with the same expression but different
* collations, so failure is NOMATCH.
*/
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
if (!equal(leftop, partkey) ||
!PartCollMatchesExprColl(partcoll, saop->inputcollid))
return PARTCLAUSE_NOMATCH;
}
So I think match_clause_to_partition_key handling collation is fine.

I think the problem is match_expr_to_partition_keys
don't have a collation related check.

CREATE TABLE pagg_join1 (c text collate case_insensitive) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_join2 (c text collate "C") PARTITION BY LIST(c
collate case_insensitive);
CREATE TABLE pagg_join3 (c text collate "POSIX") PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_join4 (c text collate case_insensitive) PARTITION BY
LIST(c collate ignore_accents);

Our partition-wise join is based on Equi-join [1]https://en.wikipedia.org/wiki/Join_%28SQL%29#Equi-join.
In some cases,column and partitionkey collation are different,
but if these two collations are deterministic, then texteq should work
as expected.
So I think, pagg_join3 can do partition-wise join,
I think pagg_join2 can do partition-wise join also.

we can let all (pagg_join1, pagg_join2, pagg_join3, pagg_join4) cannot
do partition-wise join (join with themself),
or we can let pagg_join2, pagg_join3 do partition-wise join (join with
themself).

POC attached, will let pagg_join2, pagg_join3 do partition-wise join.

tndrwang@gmail.com

over 1 year ago

In reply to: jian he (#6)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

jian he <jian.universality@gmail.com> 于2024年10月29日周二 14:15写道：

On Thu, Oct 24, 2024 at 3:01 PM Tender Wang <tndrwang@gmail.com> wrote:

I feel that it's hard only to use one struct(for example, X), which just

calls equal(X, expr)

can check both the expression match and the collation match.

in RelOptInfo->partexprs, maybe we should mention that the partition
key collation is stored
in RelOptInfo->part_scheme, not here.

Maybe we should add another collation match checks in

match_clause_to_partition_key(), like

partition pruning logic does.

in match_clause_to_partition_key
we already have

else if (IsA(clause, OpExpr) &&
list_length(((OpExpr *) clause)->args) == 2)
{
/*
* Partition key match also requires collation match. There may be
* multiple partkeys with the same expression but different
* collations, so failure is NOMATCH.
*/
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
if (!equal(leftop, partkey) ||
!PartCollMatchesExprColl(partcoll, saop->inputcollid))
return PARTCLAUSE_NOMATCH;
}
So I think match_clause_to_partition_key handling collation is fine.

I think the problem is match_expr_to_partition_keys
don't have a collation related check.

Sorry, it's a typo. It should be match_expr_to_partition_keys().

CREATE TABLE pagg_join1 (c text collate case_insensitive) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_join2 (c text collate "C") PARTITION BY LIST(c
collate case_insensitive);
CREATE TABLE pagg_join3 (c text collate "POSIX") PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_join4 (c text collate case_insensitive) PARTITION BY
LIST(c collate ignore_accents);

Our partition-wise join is based on Equi-join [1].
In some cases,column and partitionkey collation are different,
but if these two collations are deterministic, then texteq should work
as expected.
So I think, pagg_join3 can do partition-wise join,
I think pagg_join2 can do partition-wise join also.

we can let all (pagg_join1, pagg_join2, pagg_join3, pagg_join4) cannot
do partition-wise join (join with themself),
or we can let pagg_join2, pagg_join3 do partition-wise join (join with
themself).

POC attached, will let pagg_join2, pagg_join3 do partition-wise join.

Hmm, I'm not sure

[1] https://en.wikipedia.org/wiki/Join_%28SQL%29#Equi-join

--
Thanks,
Tender Wang

jian.universality@gmail.com

over 1 year ago

In reply to: Tender Wang (#7)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.

have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.

the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.

Junwang Zhao

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#8)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:

I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.

have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.

the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.

What if the partkey collation and column collation are both deterministic,
but with different sort order?

I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of column collation,
because it's the partkey collation that decides which partition a row to
be dispatched.

What Jian proposed is also reasonable but seems another aspect of $subject?

Just some random thought, might be wrong ;(

--
Regards
Junwang Zhao

#10

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: Junwang Zhao (#9)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:

I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.

have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.

the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.

What if the partkey collation and column collation are both deterministic,
but with different sort order?

I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of column collation,
because it's the partkey collation that decides which partition a row to
be dispatched.

What Jian proposed is also reasonable but seems another aspect of $subject?

I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.

+           {
+               Oid     colloid =  exprCollation((Node *) expr);
+
+               if ((partcoll != colloid) &&
+                    OidIsValid(colloid) &&
+                    !get_collation_isdeterministic(colloid))
+                   *coll_incompatiable = true;

I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.

Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.

0001 is the patch for the partitionwise grouping issue (bug #18568).
I concluded that even partial aggregate should be disallowed when the
grouping collation doesn't match partitioning collation. The result
with partial partitionwise grouping is not the same as when
enable_partitionwise_aggregate is off.

--
Thanks, Amit Langote

#11

tndrwang@gmail.com

over 1 year ago

In reply to: Amit Langote (#10)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道：

On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com>

wrote:

I missed a case when column collation and partition key collation are
the same and indeterministic.
that should be fine for partition-wise join.
so v2 attached.

have_partkey_equi_join, match_expr_to_partition_keys didn't do any
collation related check.
propose v2 change disallow partitionwise join for case when
column collation is indeterministic *and* is differ from partition
key's collation.

the attached partition_wise_join_collation.sql is the test script.
you may use it to compare with the master behavior.

What if the partkey collation and column collation are both

deterministic,

but with different sort order?

I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of column

collation,

because it's the partkey collation that decides which partition a row to
be dispatched.

What Jian proposed is also reasonable but seems another aspect of

$subject?

I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.
+           {
+               Oid     colloid =  exprCollation((Node *) expr);
+
+               if ((partcoll != colloid) &&
+                    OidIsValid(colloid) &&
+                    !get_collation_isdeterministic(colloid))
+                   *coll_incompatiable = true;
I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.

Me, too.

Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1

Agree.

(expr not matched to any key) when the collation is also not matched

instead of using a separate output parameter for that.

In have_partkey_equi_join()
...
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
Oid partcoll2 = rel1->part_scheme->partcollation[ipk];
....

I think we should use rel2 here, not rel1.

--
Thanks,
Tender Wang

#12

tndrwang@gmail.com

over 1 year ago

In reply to: Amit Langote (#10)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道：

0001 is the patch for the partitionwise grouping issue (bug #18568).
I concluded that even partial aggregate should be disallowed when the
grouping collation doesn't match partitioning collation. The result
with partial partitionwise grouping is not the same as when
enable_partitionwise_aggregate is off.

LGTM

--
Thanks,
Tender Wang

#13

jian.universality@gmail.com

over 1 year ago

In reply to: Amit Langote (#10)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

On Thu, Oct 31, 2024 at 9:09 PM Amit Langote <amitlangote09@gmail.com> wrote:

I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.
+           {
+               Oid     colloid =  exprCollation((Node *) expr);
+
+               if ((partcoll != colloid) &&
+                    OidIsValid(colloid) &&
+                    !get_collation_isdeterministic(colloid))
+                   *coll_incompatiable = true;
I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.

Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.

i was thinking that
CREATE TABLE part_tab (c text collate "POSIX") PARTITION BY LIST(c collate "C");
maybe can do partitionwise join.
join key collation and the partition key collation same sure would
make things easy.

about 0002.
Similar to PartCollMatchesExprColl in match_clause_to_partition_key
I think we can simply do the following:
no need to hack match_expr_to_partition_keys.

@@ -2181,6 +2181,9 @@ have_partkey_equi_join(PlannerInfo *root,
RelOptInfo *joinrel,
if (ipk1 != ipk2)
continue;

+               if (rel1->part_scheme->partcollation[ipk1] !=
opexpr->inputcollid)
+                       return false;

#14

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: Tender Wang (#11)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Hi,

On Fri, Nov 1, 2024 at 11:39 AM Tender Wang <tndrwang@gmail.com> wrote:

Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道：

On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:

In have_partkey_equi_join()
...
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
Oid partcoll2 = rel1->part_scheme->partcollation[ipk];
....
I think we should use rel2 here, not rel1.

Hmm, yeah, this should be fixed. Though, this is not a problem
because both rel1 and rel2 would be pointing to the same
PartitionScheme, because otherwise we wouldn't be in
have_partkey_equi_join(). See this in build_joinrel_partition_info():

/*
* We can only consider this join as an input to further partitionwise
* joins if (a) the input relations are partitioned and have
* consider_partitionwise_join=true, (b) the partition schemes match, and
* (c) we can identify an equi-join between the partition keys. Note that
* if it were possible for have_partkey_equi_join to return different
* answers for the same joinrel depending on which join ordering we try
* first, this logic would break. That shouldn't happen, though, because
* of the way the query planner deduces implied equalities and reorders
* the joins. Please see optimizer/README for details.
*/
if (outer_rel->part_scheme == NULL || inner_rel->part_scheme == NULL ||
!outer_rel->consider_partitionwise_join ||
!inner_rel->consider_partitionwise_join ||
outer_rel->part_scheme != inner_rel->part_scheme ||
!have_partkey_equi_join(root, joinrel, outer_rel, inner_rel,
sjinfo->jointype, restrictlist))
{
Assert(!IS_PARTITIONED_REL(joinrel));
return;
}

I've changed the condition to match only partcoll1 and exprcoll1, and
if they match, Assert that partcoll2 and exprcoll2 match too. That's
because partcoll1 and partcoll2 would be the same as they are from the
same PartitionScheme and expr1 and expr2 are known equal() so their
collations should match too.

--
Thanks, Amit Langote

#15

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: jian he (#13)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Hi,

On Fri, Nov 1, 2024 at 2:39 PM jian he <jian.universality@gmail.com> wrote:

On Thu, Oct 31, 2024 at 9:09 PM Amit Langote <amitlangote09@gmail.com> wrote:
I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.
+           {
+               Oid     colloid =  exprCollation((Node *) expr);
+
+               if ((partcoll != colloid) &&
+                    OidIsValid(colloid) &&
+                    !get_collation_isdeterministic(colloid))
+                   *coll_incompatiable = true;
I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.

Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.
i was thinking that
CREATE TABLE part_tab (c text collate "POSIX") PARTITION BY LIST(c collate "C");
maybe can do partitionwise join.
join key collation and the partition key collation same sure would
make things easy.

I think that's maybe ok to do as a new feature (use partitionwise join
even if collations differ but are both deterministic?), but we should
take a more restrictive approach in a bug fix that is to be
back-patched.

about 0002.
Similar to PartCollMatchesExprColl in match_clause_to_partition_key
I think we can simply do the following:
no need to hack match_expr_to_partition_keys.

@@ -2181,6 +2181,9 @@ have_partkey_equi_join(PlannerInfo *root,
RelOptInfo *joinrel,
if (ipk1 != ipk2)
continue;
+               if (rel1->part_scheme->partcollation[ipk1] !=
opexpr->inputcollid)
+                       return false;

Yes, looks like that should be enough, thanks.

I've updated the patch. I've added another test case to test the new
collation matching code in the code block of have_partkey_equi_join()
that pairs partition keys via equivalence class.

Adding Ashutosh to cc, as the original author of this code, to get his
thoughts on these fixes.

--
Thanks, Amit Langote

#16

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: Tender Wang (#1)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Hi,

On Wed, Oct 23, 2024 at 10:48 PM Tender Wang <tndrwang@gmail.com> wrote:

I think the root cause of this thread and [1] are same. We don't use the Partition Key collation but column's
collation to fill the RelOptInfo partexprs field in set_baserel_partition_key_exprs().
If the Partition Key definition is same as column definition, which most times is,
that will be ok. But if it's not, this thread issue will arise.

As far as I know, only partition pruning logic considers not only call equal(), but also check collation match.
Other codes only call equal() to check if the exprs match the partition key.
For example, in this thread case, match_expr_to_partition_keys() think the expr match the partition key:
if (equal(lfirst(lc), expr))
return cnt;

Although We can fix this issue like [1], I think why not directly use the partkey->partcollation[cnt], which its value is
same with pg_partitioned_table's partcollation. I tried this to fix [1], but at that time, I was unsure if it was the correct fix.

I think it would be better to keep RelOptInfo.partexprs unchanged in
these fixes. I haven't been able to come up with a way to "assign"
the correct collation to partition keys that are not simple column
references. Checking PartitionScheme.partcollation separately is
enough to fix these bugs and aligns with the style of existing code,
such as match_clause_to_partition_key() in partprune.c.

--
Thanks, Amit Langote

#17

tndrwang@gmail.com

over 1 year ago

In reply to: Amit Langote (#16)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

Amit Langote <amitlangote09@gmail.com> 于2024年11月1日周五 15:27写道：

Hi,

On Wed, Oct 23, 2024 at 10:48 PM Tender Wang <tndrwang@gmail.com> wrote:

I think the root cause of this thread and [1] are same. We don't use

the Partition Key collation but column's

collation to fill the RelOptInfo partexprs field in

set_baserel_partition_key_exprs().

If the Partition Key definition is same as column definition, which

most times is,

that will be ok. But if it's not, this thread issue will arise.

As far as I know, only partition pruning logic considers not only call

equal(), but also check collation match.

Other codes only call equal() to check if the exprs match the partition

key.

For example, in this thread case, match_expr_to_partition_keys() think

the expr match the partition key:

if (equal(lfirst(lc), expr))
return cnt;

Although We can fix this issue like [1], I think why not directly use

the partkey->partcollation[cnt], which its value is

same with pg_partitioned_table's partcollation. I tried this to fix [1],

but at that time, I was unsure if it was the correct fix.

I think it would be better to keep RelOptInfo.partexprs unchanged in
these fixes. I haven't been able to come up with a way to "assign"
the correct collation to partition keys that are not simple column
references. Checking PartitionScheme.partcollation separately is
enough to fix these bugs and aligns with the style of existing code,
such as match_clause_to_partition_key() in partprune.c.

Agree. I can't figure out another solution.

--
Thanks,
Tender Wang

#18

jian.universality@gmail.com

over 1 year ago

In reply to: Amit Langote (#15)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

just a quick reply while testing v4-0001.
tests copy from src/test/regress/sql/partition_aggregate.sql first 40 lines.

drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20), (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;

EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate
Group Key: pagg_tab.a
-> Append
-> Partial HashAggregate
Group Key: pagg_tab.a
-> Seq Scan on pagg_tab_p1 pagg_tab
-> Partial HashAggregate
Group Key: pagg_tab_1.a
-> Seq Scan on pagg_tab_p2 pagg_tab_1
-> Partial HashAggregate
Group Key: pagg_tab_2.a
-> Seq Scan on pagg_tab_p3 pagg_tab_2
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'

drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a text, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20)::text, (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;
EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate
Group Key: pagg_tab.a
-> Append
-> Seq Scan on pagg_tab_p1 pagg_tab_1
-> Seq Scan on pagg_tab_p2 pagg_tab_2
-> Seq Scan on pagg_tab_p3 pagg_tab_3
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'

it seems "PARTITION BY LIST(c collate "C");" collation compare with
"GROUP BY a;".
set collation_incompatible returned true.
make it cannot do PARTITIONWISE_AGGREGATE_PARTIAL.

but here "group by a", "a" is text data type, we can still do
PARTITIONWISE_AGGREGATE_PARTIAL
?

#19

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: jian he (#18)

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

On Fri, Nov 1, 2024 at 5:08 PM jian he <jian.universality@gmail.com> wrote:

just a quick reply while testing v4-0001.
tests copy from src/test/regress/sql/partition_aggregate.sql first 40 lines.

drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20), (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;

EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate
Group Key: pagg_tab.a
-> Append
-> Partial HashAggregate
Group Key: pagg_tab.a
-> Seq Scan on pagg_tab_p1 pagg_tab
-> Partial HashAggregate
Group Key: pagg_tab_1.a
-> Seq Scan on pagg_tab_p2 pagg_tab_1
-> Partial HashAggregate
Group Key: pagg_tab_2.a
-> Seq Scan on pagg_tab_p3 pagg_tab_2
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'

drop table if exists pagg_tab;
CREATE TABLE pagg_tab (a text, b int, c text, d int) PARTITION BY
LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000',
'0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005',
'0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009',
'0010', '0011');
INSERT INTO pagg_tab SELECT (i % 20)::text, (i % 30), to_char(i % 12,
'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;
EXPLAIN (COSTS OFF, settings)
SELECT a, sum(b), avg(b), count(*), max(b) FROM pagg_tab GROUP BY a;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate
Group Key: pagg_tab.a
-> Append
-> Seq Scan on pagg_tab_p1 pagg_tab_1
-> Seq Scan on pagg_tab_p2 pagg_tab_2
-> Seq Scan on pagg_tab_p3 pagg_tab_3
Settings: enable_partitionwise_aggregate = 'on',
enable_partitionwise_join = 'on', max_parallel_workers_per_gather =
'0', enable_increm
ental_sort = 'off'

it seems "PARTITION BY LIST(c collate "C");" collation compare with
"GROUP BY a;".
set collation_incompatible returned true.
make it cannot do PARTITIONWISE_AGGREGATE_PARTIAL.

but here "group by a", "a" is text data type, we can still do
PARTITIONWISE_AGGREGATE_PARTIAL
?

Good catch. Looks like I added a bug in group_by_has_partkey() --
collation_incompatible should be set only when a grouping expression
matches a partition key.

--
Thanks, Amit Langote

#20