unique constraint with several null values

Started by Mark Lybargerover 9 years ago6 messagesgeneral
Jump to latest
#1Mark Lybarger
mlybarger@gmail.com

I have a relation such as
create table order_item ( id uuid not null primary key, order_id number not
null, item_code text, make text, model text, reason text, size text,
expiration_date timestamp );

where the combination of the columns order_id, item_code, make, model,
reason, size must be unique (unless there's an expiration date).

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id,
item_code, make, model, reason, size)
where expiration_date is null;

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE',
null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE',
null); <- rejects

however, nulls are allowed for all the columns except the order_id. so,
when I add a null value, it fails to meet my expectations,

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE',
null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE',
null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to
handle the various combinations of data that I can store. Until now, this
integrity is handled in the application code. That breaks when the
application is multi-threaded and the rules are not applied at the database
level.

Another solution I can think of is to just use a trigger to prevent the
duplicate rows.

Any thoughts are certainly appreciated. I can't do much about the data
model itself right now, I need to protect the integrity of the data.

Thanks!
-mark-

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Mark Lybarger (#1)
Re: unique constraint with several null values

On Wed, Jul 20, 2016 at 2:14 PM, Mark Lybarger <mlybarger@gmail.com> wrote:

I have a relation such as
create table order_item ( id uuid not null primary key, order_id number
not null, item_code text, make text, model text, reason text, size text,
expiration_date timestamp );

where the combination of the columns order_id, item_code, make, model,
reason, size must be unique (unless there's an expiration date).

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id,
item_code, make, model, reason, size)
where expiration_date is null;

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE',
null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE',
null); <- rejects

however, nulls are allowed for all the columns except the order_id. so,
when I add a null value, it fails to meet my expectations,

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE',
null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE',
null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to
handle the various combinations of data that I can store. Until now, this
integrity is handled in the application code. That breaks when the
application is multi-threaded and the rules are not applied at the database
level.

Another solution I can think of is to just use a trigger to prevent the
duplicate rows.

Any thoughts are certainly appreciated. I can't do much about the data
model itself right now, I need to protect the integrity of the data.

​Experimenting using 9.6​

​You may or may not find this helpful...


DROP TYPE base_order_item
​;​
DROP TABLE order_item
​;​

CREATE TYPE base_order_item AS (id int, order_id numeric, item_code text,
make text, model text, reason text);
create table order_item (base_item base_order_item, expiration_date
timestamp,
exclude (base_item with =) where (expiration_date is null)
);

insert into order_item VALUES ( (1, 88, 'CODE',
'MAKE',null,'REASON')::base_order_item, null);
insert into order_item VALUES ( (1, 88, 'CODE',
'MAKE',null,'REASON')::base_order_item, null);

I could not figure out a way to specify an arbitrary record type within the
exclusion constraint - which is kinda what you want​ though it seems to
have its own issues...

SELECT ROW(1, 88, 'CODE', 'MAKE',null,'REASON')
= ROW(1, 88, 'CODE', 'MAKE',null,'REASON')
-- Yields NULL

​SELECT ROW(1, 88, 'CODE', 'MAKE',null,'REASON')::base_order_item
= ROW(1, 88, 'CODE', 'MAKE',null,'REASON')::base_order_item
-- Yields TRUE

​​Though

SELECT ROW(1, 88, 'CODE', 'MAKE',null,'REASON')
IS DISTINCT FROM ROW(1, 88, 'CODE', 'MAKE',null,'REASON')

-- Yields FALSE

​So I suppose using IS DISTINCT FROM within a Trigger is going to be your
less cumbersome option.

Because exclusion constraints and indexes are operator based you are stuck
in those contexts on having two null values considered unequal.

David J.

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: David G. Johnston (#2)
Re: unique constraint with several null values

On Wed, Jul 20, 2016 at 1:48 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Jul 20, 2016 at 2:14 PM, Mark Lybarger <mlybarger@gmail.com> wrote:

Another solution I can think of is to just use a trigger to
prevent the duplicate rows.

If you go that route you will need to use serializable
transactions, explicit locking, or trigger-based update of some
otherwise-unneeded column to avoid race conditions. See:

https://www.postgresql.org/docs/current/static/mvcc.html

That can be a perfectly valid option as long as you handle race
conditions somehow.

Any thoughts are certainly appreciated. I can't do much about
the data model itself right now, I need to protect the integrity
of the data.

Rather than unique constraints, you could add a unique index on the
COALESCE of each column with some impossible value.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Jerry Sievers
gsievers19@comcast.net
In reply to: Mark Lybarger (#1)
Re: unique constraint with several null values

Mark Lybarger <mlybarger@gmail.com> writes:

I have a relation such as
create table order_item ( id uuid not null primary key, order_id number not null, item_code text, make text, model text, reason text, size text, expiration_date
timestamp );

where the combination of the columns order_id, item_code, make, model, reason, size must be unique (unless there's an expiration date).  

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id, item_code, make, model, reason, size)
where expiration_date is null;

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- rejects

however, nulls are allowed for all the columns except the order_id. so, when I add a null value, it fails to meet my expectations, 

sj$ cat q
begin;

create table foo (a int, b int);
create unique index foox on foo (coalesce(a, 0), coalesce(b, 0));

insert into foo select 1, null;
insert into foo select 1, null;
sj$ psql -f q
SET
BEGIN
CREATE TABLE
CREATE INDEX
INSERT 0 1
psql:q:7: ERROR: duplicate key value violates unique constraint "foox"
DETAIL: Key ((COALESCE(a, 0)), (COALESCE(b, 0)))=(1, 0) already exists.
sj$

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to handle the various combinations of data that I can store.  Until now, this integrity is handled
in the application code.  That breaks when the application is multi-threaded and the rules are not applied at the database level.

Another solution I can think of is to just use a trigger to prevent the duplicate rows.  

Any thoughts are certainly appreciated.  I can't do much about the data model itself right now, I need to protect the integrity of the data.  

Thanks!
-mark-

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Chris Travers
chris.travers@gmail.com
In reply to: Mark Lybarger (#1)
Re: unique constraint with several null values

On Wed, Jul 20, 2016 at 8:14 PM, Mark Lybarger <mlybarger@gmail.com> wrote:

I have a relation such as
create table order_item ( id uuid not null primary key, order_id number
not null, item_code text, make text, model text, reason text, size text,
expiration_date timestamp );

where the combination of the columns order_id, item_code, make, model,
reason, size must be unique (unless there's an expiration date).

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id,
item_code, make, model, reason, size)
where expiration_date is null;

You can also:

create unique index unique_index_order_item_1 on order_item
(coalesce(order_id, -1), coalesce(item_code, -1), coalesce(make,
'--NULL--), coalesce(model, '--NULL--), coalesce(reason, '--NULL--),
coalesce(size, '--NULL--'), coalesce(expiration_date, '1700-01-01
00:00:00'));

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE',
null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE',
null); <- rejects

however, nulls are allowed for all the columns except the order_id. so,
when I add a null value, it fails to meet my expectations,

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE',
null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE',
null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to
handle the various combinations of data that I can store. Until now, this
integrity is handled in the application code. That breaks when the
application is multi-threaded and the rules are not applied at the database
level.

Another solution I can think of is to just use a trigger to prevent the
duplicate rows.

Any thoughts are certainly appreciated. I can't do much about the data
model itself right now, I need to protect the integrity of the data.

Thanks!
-mark-

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Mark Lybarger (#1)
Re: unique constraint with several null values

On 7/20/16 1:14 PM, Mark Lybarger wrote:

This leads me to think I need to create 2^5 or 32 unique constraints to
handle the various combinations of data that I can store.

Another option would be to create a unique index of a bit varying field
that set a bit to true for each field that was NULL WHERE <bit varying
field> != 0.

Let me know if you want to go that route, I could probably add that to
http://pgxn.org/dist/count_nulls/ without much difficulty. Though,
probably a better way to accomplish that would be to add a function to
count_nulls that spits out an array of fields that are NULL; you could
then do a unique index on that WHERE array != array[].

Maybe a less obtuse option would be to use a boolean array. Storage
would be ~8x larger, but since there should be very few rows I doubt
that matters.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general