should check collations when creating partitioned index

Started by Peter Eisentrautover 2 years ago12 messageshackers

peter_e@gmx.net

over 2 years ago

When creating a partitioned index, the partition key must be a subset of
the index's columns. DefineIndex() explains:

* If this table is partitioned and we're creating a unique index,
primary
* key, or exclusion constraint, make sure that the partition key is a
* subset of the index's columns. Otherwise it would be possible to
* violate uniqueness by putting values that ought to be unique in
* different partitions.

But this currently doesn't check that the collations between the
partition key and the index definition match. So you can construct a
unique index that fails to enforce uniqueness.

Here is a non-partitioned case for reference:

create collation case_insensitive (provider=icu,
locale='und-u-ks-level2', deterministic=false);
create table t0 (a int, b text);
create unique index i0 on t0 (b collate case_insensitive);
insert into t0 values (1, 'a'), (2, 'A'); -- violates unique constraint

Here is a partitioned case that doesn't work correctly:

create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds

The attached patch adds the required collation check. In the example,
it would not allow the index i1 to be created.

Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: Peter Eisentraut (#1)

Re: should check collations when creating partitioned index

On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:

* If this table is partitioned and we're creating a unique index, primary
* key, or exclusion constraint, make sure that the partition key is a
* subset of the index's columns. Otherwise it would be possible to
* violate uniqueness by putting values that ought to be unique in
* different partitions.

But this currently doesn't check that the collations between the
partition key and the index definition match. So you can construct a
unique index that fails to enforce uniqueness.

Here is a partitioned case that doesn't work correctly:

create collation case_insensitive (provider=icu,
locale='und-u-ks-level2', deterministic=false);
create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds

The attached patch adds the required collation check. In the example,
it would not allow the index i1 to be created.

The patch looks good, but I think the error message needs love:

+		ereport(ERROR,
+			errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("collation of column \"%s\" does not match between partition key and index definition",
+				   NameStr(att->attname)));

"does not match between" sounds weird. How about

collation of index column \"%s\" must match collation of the partitioning key column

This will be backpatched, right? What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?

Yours,
Laurenz Albe

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Laurenz Albe (#2)

Re: should check collations when creating partitioned index

On 13.11.23 21:04, Laurenz Albe wrote:

On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:

* If this table is partitioned and we're creating a unique index, primary
* key, or exclusion constraint, make sure that the partition key is a
* subset of the index's columns. Otherwise it would be possible to
* violate uniqueness by putting values that ought to be unique in
* different partitions.

But this currently doesn't check that the collations between the
partition key and the index definition match. So you can construct a
unique index that fails to enforce uniqueness.

Here is a partitioned case that doesn't work correctly:

create collation case_insensitive (provider=icu,
locale='und-u-ks-level2', deterministic=false);
create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds

The attached patch adds the required collation check. In the example,
it would not allow the index i1 to be created.

The patch looks good, but I think the error message needs love:
+		ereport(ERROR,
+			errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("collation of column \"%s\" does not match between partition key and index definition",
+				   NameStr(att->attname)));
"does not match between" sounds weird. How about

collation of index column \"%s\" must match collation of the partitioning key column

This will be backpatched, right? What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?

I think it's exotic enough that I wouldn't bother backpatching it. But
I welcome input on this.

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Peter Eisentraut (#3)

Re: should check collations when creating partitioned index

Peter Eisentraut <peter@eisentraut.org> writes:

On 13.11.23 21:04, Laurenz Albe wrote:

This will be backpatched, right? What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?

I think it's exotic enough that I wouldn't bother backpatching it. But
I welcome input on this.

I think it should be back-patched.

I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match. Also, in at least some cases
the code presses on looking for another match if the current opclass
doesn't match; you've broken such cases.

regards, tom lane

Jeff Davis

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#1)

Re: should check collations when creating partitioned index

On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:

create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2,
remainder 0);
create table t1b partition of t1 for values with (modulus 2,
remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds

The attached patch adds the required collation check. In the
example,
it would not allow the index i1 to be created.

In the patch, you check for an exact collation match. Considering this
case only depends on equality, I think it would be correct if the
requirement was that (a) both collations are deterministic; or (b) the
collations match exactly.

This is related to the discussion here:

/messages/by-id/b7a9f32eee8d24518f791168bc6fb653d1f95f4d.camel@j-davis.com

Regards,
Jeff Davis

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Jeff Davis (#5)

Re: should check collations when creating partitioned index

Jeff Davis <pgsql@j-davis.com> writes:

In the patch, you check for an exact collation match. Considering this
case only depends on equality, I think it would be correct if the
requirement was that (a) both collations are deterministic; or (b) the
collations match exactly.

You keep harping on this idea that we are only concerned with equality,
but I think you are wrong. We expect a btree index to provide ordering
not only equality, and this example definitely is a btree index.

Possibly, with a great deal more specificity added to the check, we
could distinguish the cases where ordering can't matter and allow
collation variance then. I do not see the value of that, especially
not when measured against the risk of introducing subtle bugs.

regards, tom lane

Jeff Davis

pgsql@j-davis.com

over 2 years ago

In reply to: Tom Lane (#6)

Re: should check collations when creating partitioned index

On Fri, 2023-11-17 at 15:18 -0500, Tom Lane wrote:

You keep harping on this idea that we are only concerned with
equality,
but I think you are wrong. We expect a btree index to provide
ordering
not only equality, and this example definitely is a btree index.

Possibly, with a great deal more specificity added to the check, we
could distinguish the cases where ordering can't matter and allow
collation variance then. I do not see the value of that, especially
not when measured against the risk of introducing subtle bugs.

Fair point.

As background, I don't see a complete solution to our collation
problems and on the horizon. You've probably noticed that I'm looking
for various ways to mitigate the problem, and this thread was about
reducing the number of situations in which we rely on collation.

I'll focus on other potential improvements/mitigations and see if I can
make progress somewhere else.

Regards,
Jeff Davis

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Tom Lane (#4)

Re: should check collations when creating partitioned index

On 14.11.23 17:15, Tom Lane wrote:

I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match.

Not sure why? The order doesn't seem to matter?

Also, in at least some cases
the code presses on looking for another match if the current opclass
doesn't match; you've broken such cases.

I see. That means we shouldn't raise an error on a mismatch but just do

if (key->partcollation[i] != collationIds[j])
continue;

and then let the existing error under if (!found) complain.

I suppose we could move that into the

if (get_opclass_opfamily_and_input_type(...))

block. I'm not sure I see the difference.

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Peter Eisentraut (#8)

Re: should check collations when creating partitioned index

Peter Eisentraut <peter@eisentraut.org> writes:

On 14.11.23 17:15, Tom Lane wrote:

I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match.

Not sure why? The order doesn't seem to matter?

The case that was bothering me was if we had a non-collated type
versus a collated type. That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos. However, if we do this:

I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;

it might not matter much.

regards, tom lane

#10

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Tom Lane (#9)

Re: should check collations when creating partitioned index

On 20.11.23 17:25, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

On 14.11.23 17:15, Tom Lane wrote:

I don't love the patch details though. It seems entirely wrong to check
this before we check the opclass match.

Not sure why? The order doesn't seem to matter?

The case that was bothering me was if we had a non-collated type
versus a collated type. That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos. However, if we do this:

I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;

it might not matter much.

Here is an updated patch that works as indicated above.

The behavior if you try to create an index with mismatching collations
now is that it will skip over the column and complain at the end with
something like

ERROR: 0A000: unique constraint on partitioned table must include all
partitioning columns
DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part
of the partition key.

which perhaps isn't intuitive, but I think it would be the same if you
somehow tried to build an index with different operator classes than the
partitioning. I think these less-specific error messages are ok in such
edge cases.

#11

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Peter Eisentraut (#10)

Re: should check collations when creating partitioned index

On 23.11.23 11:01, Peter Eisentraut wrote:

On 20.11.23 17:25, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

On 14.11.23 17:15, Tom Lane wrote:

I don't love the patch details though. It seems entirely wrong to
check
this before we check the opclass match.

Not sure why? The order doesn't seem to matter?

The case that was bothering me was if we had a non-collated type
versus a collated type. That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos. However, if we do this:

I see. That means we shouldn't raise an error on a mismatch but just do
if (key->partcollation[i] != collationIds[j])
continue;

it might not matter much.

Here is an updated patch that works as indicated above.

The behavior if you try to create an index with mismatching collations
now is that it will skip over the column and complain at the end with
something like

ERROR: 0A000: unique constraint on partitioned table must include all
partitioning columns
DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part
of the partition key.

which perhaps isn't intuitive, but I think it would be the same if you
somehow tried to build an index with different operator classes than the
partitioning. I think these less-specific error messages are ok in such
edge cases.

If there are no further comments on this patch version, I plan to go
ahead and commit it soon.

#12

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Peter Eisentraut (#11)

Re: should check collations when creating partitioned index

Peter Eisentraut <peter@eisentraut.org> writes:

Here is an updated patch that works as indicated above.

The behavior if you try to create an index with mismatching collations
now is that it will skip over the column and complain at the end with
something like

ERROR: 0A000: unique constraint on partitioned table must include all
partitioning columns
DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part
of the partition key.

which perhaps isn't intuitive, but I think it would be the same if you
somehow tried to build an index with different operator classes than the
partitioning. I think these less-specific error messages are ok in such
edge cases.

If there are no further comments on this patch version, I plan to go
ahead and commit it soon.

Sorry for slow response --- I've been dealing with a little too much
$REAL_LIFE lately. Anyway, I'm content with the v2 patch. I see
that the existing code works a little harder than this to produce
an on-point error message for mismatching operator, but after
studying that I'm not totally convinced that it's ideal behavior
either. I think we can wait for some field complaints to see if
we need a better error message for mismatching collation, and
if so what the shape of the bad input is exactly.

regards, tom lane

should check collations when creating partitioned index

Attachments:

Attachments: