UNIQUE null treatment option

Started by Peter Eisentrautover 4 years ago14 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not. Different
implementations have different behaviors. In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL. The default behavior
remains UNIQUE NULLS DISTINCT. Making this happen in the btree code
is apparently pretty easy; most of the patch is just to carry the flag
around to all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

(I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false. But perhaps the double
negatives make some code harder to read.)

Attachments:

0001-Add-UNIQUE-null-treatment-option.patchtext/plain; charset=UTF-8; name=0001-Add-UNIQUE-null-treatment-option.patch; x-mac-creator=0; x-mac-type=0Download+332-59
#2Marko Tiikkaja
marko@joh.to
In reply to: Peter Eisentraut (#1)
Re: UNIQUE null treatment option

On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

For the unique index syntax, should this be selectable per
column/expression, rather than for the entire index as a whole?

.m

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Marko Tiikkaja (#2)
Re: UNIQUE null treatment option

On 27.08.21 14:44, Marko Tiikkaja wrote:

On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:

In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

For the unique index syntax, should this be selectable per
column/expression, rather than for the entire index as a whole?

Semantically, this would be possible, but the bookkeeping to make it
work seems out of proportion with the utility. And you'd have the
unique index syntax out of sync with the unique constraint syntax, which
would be pretty confusing.

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#1)
Re: UNIQUE null treatment option

Here is a rebased version of this patch.

Show quoted text

On 27.08.21 14:38, Peter Eisentraut wrote:

The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is apparently pretty easy; most of the patch is just to carry the flag
around to all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

(I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false.  But perhaps the double
negatives make some code harder to read.)

Attachments:

v2-0001-Add-UNIQUE-null-treatment-option.patchtext/plain; charset=UTF-8; name=v2-0001-Add-UNIQUE-null-treatment-option.patchDownload+342-59
#5Zhihong Yu
zyu@yugabyte.com
In reply to: Peter Eisentraut (#4)
Re: UNIQUE null treatment option

Hi,

bool isunique;
+ bool nulls_not_distinct;
} BTSpool;

Looking at the other fields in BTSpool, there is no underscore in field
name.
I think the new field can be named nullsdistinct. This way, the
double negative is avoided.

Similar comment for new fields in BTShared and BTLeader.

And the naming would be consistent with information_schema.sql where
nulls_distinct is used:

+           CAST('YES' AS yes_or_no) AS enforced,
+           CAST(NULL AS yes_or_no) AS nulls_distinct

Cheers

#6Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Peter Eisentraut (#4)
Re: UNIQUE null treatment option

+1 for commiting this feature. Consider this useful.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com&gt;

#7Maxim Orlov
orlovmg@gmail.com
In reply to: Pavel Borisov (#6)
Re: UNIQUE null treatment option

I find this patch useful. It includes changes in documentation and tests.
Code itself looks reasonable to me. Since, unique constraint check is done
by corresponding btree index, it makes this feature implementation
elegant and lightweight.

In my view, it is sufficient that heap relation can have different nulls
treatment in unique constraints for different unique columns. For example:
CREATE TABLE t (i INT UNIQUE NULLS DISTINCT, a INT UNIQUE NULLS NOT
DISTINCT);

All the tests are running ok on Linux and MacOS X.

Although, patch doesn't apply with default git apply options. Only with the
"three way merge" option (-3). Consider rebasing it, please. Then, in my
view, it can be "Ready for committer".
--
Best regards,
Maxim Orlov.

In reply to: Peter Eisentraut (#4)
Re: UNIQUE null treatment option

On Wed, Dec 29, 2021 at 2:06 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Here is a rebased version of this patch.

BTScanInsertData.anynullkeys already effectively means "if the index
is a unique index, then we don't actually need to go through
_bt_check_unique(), or perform any other checkingunique steps". This
is really an instruction about what to do (or not do), based on the
specifics of the values for the insertion scan key plus the index
definition. In other words, the code in _bt_mkscankey() that sets up
BTScanInsertData (an insertion scankey) was written with the exact
requirements of btinsert() in mind -- nothing more.

I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#8)
Re: UNIQUE null treatment option

On Thu, Jan 13, 2022 at 10:36 AM Peter Geoghegan <pg@bowt.ie> wrote:

I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

Correction: I meant to write "...a NULL key column in a NULLS DISTINCT
unique constraint is very similar...".

--
Peter Geoghegan

#10Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Peter Geoghegan (#8)
Re: UNIQUE null treatment option

I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

A good point, indeed!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com&gt;

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Geoghegan (#8)
Re: UNIQUE null treatment option

On 13.01.22 19:36, Peter Geoghegan wrote:

I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

Makes sense. Here is an updated patch with this change.

I didn't end up renaming anynullkeys. I came up with names like
"anyalwaysdistinctkeys", but in the end that felt too abstract, and
moreover, it would require rewriting a bunch of code comments that refer
to null values in this context. Since as you wrote, anynullkeys is just
a local concern between two functions, this slight inaccuracy is perhaps
better than some highly general but unclear terminology.

Attachments:

v3-0001-Add-UNIQUE-null-treatment-option.patchtext/plain; charset=UTF-8; name=v3-0001-Add-UNIQUE-null-treatment-option.patchDownload+347-57
#12Maxim Orlov
orlovmg@gmail.com
In reply to: Peter Eisentraut (#11)
Re: UNIQUE null treatment option

Since cfbot did failed with error, probably, unrelated to the patch itself
(see https://cirrus-ci.com/task/5330150500859904)
and repeated check did not start automatically, I reattach patch v3 to
restart cfbot on this patch.

--
Best regards,
Maxim Orlov.

Attachments:

v3-0001-Add-UNIQUE-null-treatment-option.patchapplication/octet-stream; name=v3-0001-Add-UNIQUE-null-treatment-option.patchDownload+347-57
#13Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Peter Eisentraut (#11)
Re: UNIQUE null treatment option

Makes sense. Here is an updated patch with this change.

I didn't end up renaming anynullkeys. I came up with names like
"anyalwaysdistinctkeys", but in the end that felt too abstract, and
moreover, it would require rewriting a bunch of code comments that refer
to null values in this context. Since as you wrote, anynullkeys is just
a local concern between two functions, this slight inaccuracy is perhaps
better than some highly general but unclear terminology.

Agree with that. With the comment it is clear how it works.

I've looked at the patch v3. It seems good enough for me. CFbot tests have
also come green.
Suggest it is RFC now.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com&gt;

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Pavel Borisov (#13)
Re: UNIQUE null treatment option

On 28.01.22 13:56, Pavel Borisov wrote:

Makes sense.  Here is an updated patch with this change.

I didn't end up renaming anynullkeys.  I came up with names like
"anyalwaysdistinctkeys", but in the end that felt too abstract, and
moreover, it would require rewriting a bunch of code comments that
refer
to null values in this context.  Since as you wrote, anynullkeys is
just
a local concern between two functions, this slight inaccuracy is
perhaps
better than some highly general but unclear terminology.

Agree with that. With the comment it is clear how it works.

I've looked at the patch v3. It seems good enough for me. CFbot tests
have also come green.
Suggest it is RFC now.

Committed. Thanks.