should check collations when creating partitioned index
When creating a partitioned index, the partition key must be a subset of
the index's columns. DefineIndex() explains:
* If this table is partitioned and we're creating a unique index,
primary
* key, or exclusion constraint, make sure that the partition key is a
* subset of the index's columns. Otherwise it would be possible to
* violate uniqueness by putting values that ought to be unique in
* different partitions.
But this currently doesn't check that the collations between the
partition key and the index definition match. So you can construct a
unique index that fails to enforce uniqueness.
Here is a non-partitioned case for reference:
create collation case_insensitive (provider=icu,
locale='und-u-ks-level2', deterministic=false);
create table t0 (a int, b text);
create unique index i0 on t0 (b collate case_insensitive);
insert into t0 values (1, 'a'), (2, 'A'); -- violates unique constraint
Here is a partitioned case that doesn't work correctly:
create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds
The attached patch adds the required collation check. In the example,
it would not allow the index i1 to be created.
Attachments:
0001-Check-collation-when-creating-partitioned-index.patchtext/plain; charset=UTF-8; name=0001-Check-collation-when-creating-partitioned-index.patchDownload
From 3acd59dcd7dffcfd22e61cce3ac1bb4c6982d16d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 13 Nov 2023 10:15:23 +0100
Subject: [PATCH] Check collation when creating partitioned index
---
src/backend/commands/indexcmds.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c160d8a301..729f200395 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1011,10 +1011,20 @@ DefineIndex(Oid tableId,
{
if (key->partattrs[i] == indexInfo->ii_IndexAttrNumbers[j])
{
- /* Matched the column, now what about the equality op? */
+ /* Matched the column, now what about the collation and equality op? */
Oid idx_opfamily;
Oid idx_opcintype;
+ if (key->partcollation[i] != collationIds[j])
+ {
+ Form_pg_attribute att = TupleDescAttr(RelationGetDescr(rel), key->partattrs[i] - 1);
+
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("collation of column \"%s\" does not match between partition key and index definition",
+ NameStr(att->attname)));
+ }
+
if (get_opclass_opfamily_and_input_type(opclassIds[j],
&idx_opfamily,
&idx_opcintype))
--
2.42.1
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:
* If this table is partitioned and we're creating a unique index, primary
* key, or exclusion constraint, make sure that the partition key is a
* subset of the index's columns. Otherwise it would be possible to
* violate uniqueness by putting values that ought to be unique in
* different partitions.But this currently doesn't check that the collations between the
partition key and the index definition match. So you can construct a
unique index that fails to enforce uniqueness.Here is a partitioned case that doesn't work correctly:
create collation case_insensitive (provider=icu,
locale='und-u-ks-level2', deterministic=false);
create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeedsThe attached patch adds the required collation check. In the example,
it would not allow the index i1 to be created.
The patch looks good, but I think the error message needs love:
+ ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("collation of column \"%s\" does not match between partition key and index definition", + NameStr(att->attname)));
"does not match between" sounds weird. How about
collation of index column \"%s\" must match collation of the partitioning key column
This will be backpatched, right? What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?
Yours,
Laurenz Albe
On 13.11.23 21:04, Laurenz Albe wrote:
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:
* If this table is partitioned and we're creating a unique index, primary
* key, or exclusion constraint, make sure that the partition key is a
* subset of the index's columns. Otherwise it would be possible to
* violate uniqueness by putting values that ought to be unique in
* different partitions.But this currently doesn't check that the collations between the
partition key and the index definition match. So you can construct a
unique index that fails to enforce uniqueness.Here is a partitioned case that doesn't work correctly:
create collation case_insensitive (provider=icu,
locale='und-u-ks-level2', deterministic=false);
create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeedsThe attached patch adds the required collation check. In the example,
it would not allow the index i1 to be created.The patch looks good, but I think the error message needs love:
+ ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("collation of column \"%s\" does not match between partition key and index definition", + NameStr(att->attname)));"does not match between" sounds weird. How about
collation of index column \"%s\" must match collation of the partitioning key column
This will be backpatched, right? What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?
I think it's exotic enough that I wouldn't bother backpatching it. But
I welcome input on this.
Peter Eisentraut <peter@eisentraut.org> writes:
On 13.11.23 21:04, Laurenz Albe wrote:
This will be backpatched, right? What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?
I think it's exotic enough that I wouldn't bother backpatching it. But
I welcome input on this.
I think it should be back-patched.
I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match. Also, in at least some cases
the code presses on looking for another match if the current opclass
doesn't match; you've broken such cases.
regards, tom lane
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:
create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2,
remainder 0);
create table t1b partition of t1 for values with (modulus 2,
remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeedsThe attached patch adds the required collation check. In the
example,
it would not allow the index i1 to be created.
In the patch, you check for an exact collation match. Considering this
case only depends on equality, I think it would be correct if the
requirement was that (a) both collations are deterministic; or (b) the
collations match exactly.
This is related to the discussion here:
/messages/by-id/b7a9f32eee8d24518f791168bc6fb653d1f95f4d.camel@j-davis.com
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes:
In the patch, you check for an exact collation match. Considering this
case only depends on equality, I think it would be correct if the
requirement was that (a) both collations are deterministic; or (b) the
collations match exactly.
You keep harping on this idea that we are only concerned with equality,
but I think you are wrong. We expect a btree index to provide ordering
not only equality, and this example definitely is a btree index.
Possibly, with a great deal more specificity added to the check, we
could distinguish the cases where ordering can't matter and allow
collation variance then. I do not see the value of that, especially
not when measured against the risk of introducing subtle bugs.
regards, tom lane
On Fri, 2023-11-17 at 15:18 -0500, Tom Lane wrote:
You keep harping on this idea that we are only concerned with
equality,
but I think you are wrong. We expect a btree index to provide
ordering
not only equality, and this example definitely is a btree index.Possibly, with a great deal more specificity added to the check, we
could distinguish the cases where ordering can't matter and allow
collation variance then. I do not see the value of that, especially
not when measured against the risk of introducing subtle bugs.
Fair point.
As background, I don't see a complete solution to our collation
problems and on the horizon. You've probably noticed that I'm looking
for various ways to mitigate the problem, and this thread was about
reducing the number of situations in which we rely on collation.
I'll focus on other potential improvements/mitigations and see if I can
make progress somewhere else.
Regards,
Jeff Davis
On 14.11.23 17:15, Tom Lane wrote:
I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match.
Not sure why? The order doesn't seem to matter?
Also, in at least some cases
the code presses on looking for another match if the current opclass
doesn't match; you've broken such cases.
I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;
and then let the existing error under if (!found) complain.
I suppose we could move that into the
if (get_opclass_opfamily_and_input_type(...))
block. I'm not sure I see the difference.
Peter Eisentraut <peter@eisentraut.org> writes:
On 14.11.23 17:15, Tom Lane wrote:
I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match.
Not sure why? The order doesn't seem to matter?
The case that was bothering me was if we had a non-collated type
versus a collated type. That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos. However, if we do this:
I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;
it might not matter much.
regards, tom lane
On 20.11.23 17:25, Tom Lane wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
On 14.11.23 17:15, Tom Lane wrote:
I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match.Not sure why? The order doesn't seem to matter?
The case that was bothering me was if we had a non-collated type
versus a collated type. That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos. However, if we do this:I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;it might not matter much.
Here is an updated patch that works as indicated above.
The behavior if you try to create an index with mismatching collations
now is that it will skip over the column and complain at the end with
something like
ERROR: 0A000: unique constraint on partitioned table must include all
partitioning columns
DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part
of the partition key.
which perhaps isn't intuitive, but I think it would be the same if you
somehow tried to build an index with different operator classes than the
partitioning. I think these less-specific error messages are ok in such
edge cases.
Attachments:
v2-0001-Check-collation-when-creating-partitioned-index.patchtext/plain; charset=UTF-8; name=v2-0001-Check-collation-when-creating-partitioned-index.patchDownload
From 869a66c429eb188ceafcbd972b6e46b63fce88f3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 23 Nov 2023 10:37:21 +0100
Subject: [PATCH v2] Check collation when creating partitioned index
When creating a partitioned index, the partition key must be a subset
of the index's columns. But this currently doesn't check that the
collations between the partition key and the index definition match.
So you can construct a unique index that fails to enforce uniqueness.
(This would most likely involve a nondeterministic collation, so it
would have to be crafted explicitly and is not something that would
just happen by accident.)
This patch adds the required collation check. As a result, any
previously allowed unique index that has a collation mismatch would no
longer be allowed to be created.
Discussion: https://www.postgresql.org/message-id/flat/3327cb54-f7f1-413b-8fdb-7a9dceebb938%40eisentraut.org
---
src/backend/commands/indexcmds.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 0b3b8e98b8..c7ecedbe3b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1011,10 +1011,13 @@ DefineIndex(Oid tableId,
{
if (key->partattrs[i] == indexInfo->ii_IndexAttrNumbers[j])
{
- /* Matched the column, now what about the equality op? */
+ /* Matched the column, now what about the collation and equality op? */
Oid idx_opfamily;
Oid idx_opcintype;
+ if (key->partcollation[i] != collationIds[j])
+ continue;
+
if (get_opclass_opfamily_and_input_type(opclassIds[j],
&idx_opfamily,
&idx_opcintype))
--
2.42.1
On 23.11.23 11:01, Peter Eisentraut wrote:
On 20.11.23 17:25, Tom Lane wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
On 14.11.23 17:15, Tom Lane wrote:
I don't love the patch details though. It seems entirely wrong to
check
this before we check the opclass match.Not sure why? The order doesn't seem to matter?
The case that was bothering me was if we had a non-collated type
versus a collated type. That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos. However, if we do this:I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;it might not matter much.
Here is an updated patch that works as indicated above.
The behavior if you try to create an index with mismatching collations
now is that it will skip over the column and complain at the end with
something likeERROR: 0A000: unique constraint on partitioned table must include all
partitioning columns
DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part
of the partition key.which perhaps isn't intuitive, but I think it would be the same if you
somehow tried to build an index with different operator classes than the
partitioning. I think these less-specific error messages are ok in such
edge cases.
If there are no further comments on this patch version, I plan to go
ahead and commit it soon.
Peter Eisentraut <peter@eisentraut.org> writes:
Here is an updated patch that works as indicated above.
The behavior if you try to create an index with mismatching collations
now is that it will skip over the column and complain at the end with
something likeERROR: 0A000: unique constraint on partitioned table must include all
partitioning columns
DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part
of the partition key.which perhaps isn't intuitive, but I think it would be the same if you
somehow tried to build an index with different operator classes than the
partitioning. I think these less-specific error messages are ok in such
edge cases.
If there are no further comments on this patch version, I plan to go
ahead and commit it soon.
Sorry for slow response --- I've been dealing with a little too much
$REAL_LIFE lately. Anyway, I'm content with the v2 patch. I see
that the existing code works a little harder than this to produce
an on-point error message for mismatching operator, but after
studying that I'm not totally convinced that it's ideal behavior
either. I think we can wait for some field complaints to see if
we need a better error message for mismatching collation, and
if so what the shape of the bad input is exactly.
regards, tom lane