Unique Index
Hi,
i have a unique index on a table over multiple columns. If now one of
the records has a null value in one of the indexed columns i can insert
the same record multiple times.
Is this a problem within postgres or expected?
Example:
index unique, btree (colA, colB, colC);
would still allow me to insert
AAAA, ,CCCC
AAAA, ,CCCC
Thanks for any help on that.
A
On Thu, 20 Jan 2005, Alex wrote:
i have a unique index on a table over multiple columns. If now one of
the records has a null value in one of the indexed columns i can insert
the same record multiple times.Is this a problem within postgres or expected?
Expected. NULLs are effectively not considered as the same for the
purposes of UNIQUE.
The predicate basically functions as:
2) If there are no two rows in T such that the value of each column
in one row is non-null and is equal to the value of the cor-
responding column in the other row according to Subclause 8.2,
"<comparison predicate>", then the result of the <unique predi-
cate> is true; otherwise, the result of the <unique predicate>
is false.
# alex@meerkatsoft.com / 2005-01-20 01:35:32 +1100:
i have a unique index on a table over multiple columns. If now one of
the records has a null value in one of the indexed columns i can insert
the same record multiple times.Is this a problem within postgres or expected?
In SQL, NULL means "unknown value". How could you assert that two
NULLs are equal?
--
If you cc me or remove the list(s) completely I'll most likely ignore
your message. see http://www.eyrie.org./~eagle/faqs/questions.html
Roman Neuhauser wrote:
# alex@meerkatsoft.com / 2005-01-20 01:35:32 +1100:
i have a unique index on a table over multiple columns. If now one of
the records has a null value in one of the indexed columns i can insert
the same record multiple times.Is this a problem within postgres or expected?
In SQL, NULL means "unknown value". How could you assert that two
NULLs are equal?
which doesn't make mathematical sense.
mathwise null is an empty result.
so setting the logic up using the math logic, null values are always equal.
--
========================================
only plain text format email accepted.
smaller file size, no virus transfer
no proprietary file formats.
========================================
According to Date you should never use NULLs. This is because a NULL can
mean many different things. It can mean not known (e.g. I know he has an
age but I don't know what it is), It can be not applicable (e.g. in a
Party table of organizations and people, people would be of a certain sex
but an organization would not), It can mean a number divided by zero, It
can also mean I don't know if the value is unknown or if the value is not
applicable etc. etc. In Date's view there is an infinite number of
meanings for null and for a database to handle it correctly would require
an equivalent set of operators for each type of null.
I think though that Nulls are a very useful feature of SQL databases. It's
just that when I write a program, I have to know in a particular context
what a null means and handle it appropriately (of course controlling nulls
in an index is a different matter). I don't think there is only one
accepted way to use nulls.
Show quoted text
Roman Neuhauser wrote:
# alex@meerkatsoft.com / 2005-01-20 01:35:32 +1100:
i have a unique index on a table over multiple columns. If now one of
the records has a null value in one of the indexed columns i can insert
the same record multiple times.Is this a problem within postgres or expected?
In SQL, NULL means "unknown value". How could you assert that two
NULLs are equal?which doesn't make mathematical sense.
mathwise null is an empty result.
so setting the logic up using the math logic, null values are always
equal.--
========================================only plain text format email accepted.
smaller file size, no virus transfer
no proprietary file formats.========================================
Null values are a big surprise to almost every end-user (though the
programmers are OK with them).
Look at the astonishment on the face of your end user when you tell them
that:
SELECT COUNT(*) FROM clothing WHERE clothing_color = 'green'
+
SELECT COUNT(*) FROM clothing WHERE NOT clothing_color = 'green'
Is not the count of all clothing because clothing without a color
recorded will not be counted.
Or (perhaps better yet, violating trichotomy) ...
If <Some_column> has a null numeric value, then ALL of the following are
FALSE for that case:
Some_column < 0
Some_column > 0
Some_column = 0
Some_column <> 0 // This is the one that many find surprising
Some_column <= 0
Some_column >= 0
You can probably see why Null values can do strange things in (for
instance) an index.
Even at that, I think that being able to insert more than one null value
into a unique index should be considered as a bug (or diagnosed as an
error).
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
vhikida@inreach.com
Sent: Wednesday, January 19, 2005 3:30 PM
To: J. Greenlees
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unique Index
According to Date you should never use NULLs. This is because a NULL can
mean many different things. It can mean not known (e.g. I know he has an
age but I don't know what it is), It can be not applicable (e.g. in a
Party table of organizations and people, people would be of a certain
sex
but an organization would not), It can mean a number divided by zero, It
can also mean I don't know if the value is unknown or if the value is
not
applicable etc. etc. In Date's view there is an infinite number of
meanings for null and for a database to handle it correctly would
require
an equivalent set of operators for each type of null.
I think though that Nulls are a very useful feature of SQL databases.
It's
just that when I write a program, I have to know in a particular context
what a null means and handle it appropriately (of course controlling
nulls
in an index is a different matter). I don't think there is only one
accepted way to use nulls.
Import Notes
Resolved by subject fallback
"Dann Corbit" <DCorbit@connx.com> writes:
Or (perhaps better yet, violating trichotomy) ...
If <Some_column> has a null numeric value, then ALL of the following are
FALSE for that case:
Some_column < 0
Some_column > 0
Some_column = 0
Some_column <> 0 // This is the one that many find surprising
Some_column <= 0
Some_column >= 0
It's worse than that: the above do *not* yield FALSE, they yield NULL.
Which does act like FALSE in a simple WHERE clause, but there are other
cases (like CHECK clauses) where it doesn't. "x NOT IN (SELECT ...)"
is a case that newbies routinely get bitten by.
Even at that, I think that being able to insert more than one null value
into a unique index should be considered as a bug (or diagnosed as an
error).
Direct your complaints to the ISO SQL standards committee.
regards, tom lane
The ISO SQL Standard does not even define an index, and so any index is
an extension to the standard (though primary keys and foreign keys imply
them).
At least in the SQL Standard that I have (ANSI/ISO/IEC 9075-1-1999 and
related documents) has no definition of an index. Perhaps the newer
version contains such a definition.
So, in the creation of an extension, I think it is up to the programmer
to do whatever is best.
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, January 19, 2005 4:18 PM
To: Dann Corbit
Cc: vhikida@inreach.com; J. Greenlees; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unique Index
"Dann Corbit" <DCorbit@connx.com> writes:
Or (perhaps better yet, violating trichotomy) ...
If <Some_column> has a null numeric value, then ALL of the following
are
FALSE for that case:
Some_column < 0
Some_column > 0
Some_column = 0
Some_column <> 0 // This is the one that many find surprising
Some_column <= 0
Some_column >= 0
It's worse than that: the above do *not* yield FALSE, they yield NULL.
Which does act like FALSE in a simple WHERE clause, but there are other
cases (like CHECK clauses) where it doesn't. "x NOT IN (SELECT ...)"
is a case that newbies routinely get bitten by.
Even at that, I think that being able to insert more than one null
value
into a unique index should be considered as a bug (or diagnosed as an
error).
Direct your complaints to the ISO SQL standards committee.
regards, tom lane
Import Notes
Resolved by subject fallback
On Wed, 19 Jan 2005, Dann Corbit wrote:
Even at that, I think that being able to insert more than one null value
into a unique index should be considered as a bug (or diagnosed as an
error).
AFAICT the UNIQUE constraint that it's used to model explicitly allows
multiple NULLs in the spec so I don't see making it error as being
terribly workable.
True, but the standard says nothing about the creation of an index, so
you can make it behave in any way that you see fit.
-----Original Message-----
From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]
Sent: Wednesday, January 19, 2005 4:27 PM
To: Dann Corbit
Cc: vhikida@inreach.com; J. Greenlees; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unique Index
On Wed, 19 Jan 2005, Dann Corbit wrote:
Even at that, I think that being able to insert more than one null
value
into a unique index should be considered as a bug (or diagnosed as an
error).
AFAICT the UNIQUE constraint that it's used to model explicitly allows
multiple NULLs in the spec so I don't see making it error as being
terribly workable.
Import Notes
Resolved by subject fallback
On Wed, 19 Jan 2005, Dann Corbit wrote:
True, but the standard says nothing about the creation of an index, so
you can make it behave in any way that you see fit.
The unique index is however used to model the unique constraint in
PostgreSQL which I had thought was clear from my statement so giving the
unique index behavior which makes it unable to model the constraint
wouldn't be terribly workable (without rewriting the constraint to be
modeled in a separate fashion).
Show quoted text
-----Original Message-----
From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]
Sent: Wednesday, January 19, 2005 4:27 PM
To: Dann Corbit
Cc: vhikida@inreach.com; J. Greenlees; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unique IndexOn Wed, 19 Jan 2005, Dann Corbit wrote:
Even at that, I think that being able to insert more than one null
value
into a unique index should be considered as a bug (or diagnosed as an
error).AFAICT the UNIQUE constraint that it's used to model explicitly allows
multiple NULLs in the spec so I don't see making it error as being
terribly workable.
On Wed, 19 Jan 2005, Stephan Szabo wrote:
On Wed, 19 Jan 2005, Dann Corbit wrote:
True, but the standard says nothing about the creation of an index, so
you can make it behave in any way that you see fit.The unique index is however used to model the unique constraint in
PostgreSQL which I had thought was clear from my statement so giving the
unique index behavior which makes it unable to model the constraint
wouldn't be terribly workable (without rewriting the constraint to be
modeled in a separate fashion).
Actually, thinking about it, I think I generally disagree with the idea
that unique is the correct word to use in any case. It's used in both the
constraint and in the predicate to have its current meaning (as described
in my earlier message) and I think the word distinct already the semantics
of treating two NULLs in the fashion such that an index wouldn't allow
two.
I actually just wanted to know if there is a way around this problem.
Obviously it is implemented that way for whatever reason.
I still though think some arguments given in some of the replies, while
probably correct, are besides the point.
I use a unique index that may contain null values. On an insert or
update I can now not rely on the exception thrown but actually have to
write a select statement to check if the same row exists, which I
believe defies ONE purpose of having unique indices. Whether Null is
associated with "unknown value", "divided by zero"... or however one
wants to interpret it is not the issue here, in my view NULL in the same
column have the same value or at least should be treated the same. (If I
want to differentiate the state, I would use a code instead of NULL as a
NULL does not give any indication of its meaning, thus we could safely
assume they are treated as equal).
Maybe there could be an option in the creation of the index to indicate
on how to use NULL values.
How do other DBMS handle this?
A
Tom Lane wrote:
Show quoted text
"Dann Corbit" <DCorbit@connx.com> writes:
Or (perhaps better yet, violating trichotomy) ...
If <Some_column> has a null numeric value, then ALL of the following are
FALSE for that case:Some_column < 0
Some_column > 0
Some_column = 0
Some_column <> 0 // This is the one that many find surprising
Some_column <= 0
Some_column >= 0It's worse than that: the above do *not* yield FALSE, they yield NULL.
Which does act like FALSE in a simple WHERE clause, but there are other
cases (like CHECK clauses) where it doesn't. "x NOT IN (SELECT ...)"
is a case that newbies routinely get bitten by.Even at that, I think that being able to insert more than one null value
into a unique index should be considered as a bug (or diagnosed as an
error).Direct your complaints to the ISO SQL standards committee.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Tom Lane <tgl@sss.pgh.pa.us> writes:
"Dann Corbit" <DCorbit@connx.com> writes:
Even at that, I think that being able to insert more than one null value
into a unique index should be considered as a bug (or diagnosed as an
error).Direct your complaints to the ISO SQL standards committee.
The SQL standard generally treats NULLs as a escape hatch for constraints.
That's true for CHECK constraints as well: if you have a "CHECK a>0"
constraint but a is nullable then a NULL value is allowed even though the
check constraint can't be verified.
This isn't an unreasonable approach. Once you have NULLs it makes more sense
to make them useful than to treat them as just another value. You can always
make the column NOT NULL if you don't want any exceptions to your constraints.
--
greg
Alex <alex@meerkatsoft.com> writes:
I actually just wanted to know if there is a way around this problem. Obviously
it is implemented that way for whatever reason.
The way around is to make all the columns NOT NULL. For most applications
unique indexes don't make much sense on nullable columns.
(If I want to differentiate the state, I would use a code instead of NULL as
a NULL does not give any indication of its meaning, thus we could safely
assume they are treated as equal).
I think you have that backwards. You use NULL in the case where you want all
cases to compare as unknown values. If you want them to compare as known
values then you should use a special value.
In other words, NULL has special properties. You should use it if those
special properties are what you want. If you want the system to enforce a
unique constraint on the special value then you probably don't want to be
using NULL for that special state.
Specifically if you find yourself saying "we could safely assume they are
treated equal" then NULL is almost certainly not what you want to represent
that. NULL never compares equal to anything.
--
greg
Dann Corbit wrote:
True, but the standard says nothing about the creation of an index, so
you can make it behave in any way that you see fit.
But I thought we are talking about unique _constraint_ here (which is
certainly regulated by the standard).
--
dave
On Jan 20, 2005, at 16:03, David Garamond wrote:
Dann Corbit wrote:
True, but the standard says nothing about the creation of an index, so
you can make it behave in any way that you see fit.But I thought we are talking about unique _constraint_ here (which is
certainly regulated by the standard).
They could conceivably be separated. The standard likewise doesn't say
anything about how the unique constraint is enforced. In PostgreSQL, a
"unique" index is used to enforce the constraint, but the unique index
is not intrinsically tied to the unique constraint.
Michael Glaesemann
grzm myrealbox com
I actually just wanted to know if there is a way around this problem.
Obviously it is implemented that way for whatever reason.I still though think some arguments given in some of the replies, while
probably correct, are besides the point.
Sorry. I was hoping someone else would answer.
I use a unique index that may contain null values. On an insert or update
I can now not rely on the exception thrown but actually have to write a
select statement to check if the same row exists, which I believe defies
ONE purpose of having unique indices. Whether Null is associated with
"unknown value", "divided by zero"... or however one wants to interpret it
is not the issue here, in my view NULL in the same column have the same
value or at least should be treated the same. (If I want to differentiate
the state, I would use a code instead of NULL as a NULL does not give any
indication of its meaning, thus we could safely assume they are treated as
equal).Maybe there could be an option in the creation of the index to indicate on
how to use NULL values.
I can think of two options.
One was mentioned already. If only one row can have a null value then it
seems to me that you should make it a non null and null would have a special
code.
If it really needs to be null. Then a rather messy solution would be to have
a second column (I'll call it a null indicator) which can only be 1 or null
and have a unique index on it.
colA ind
------ -----
1 null
2 null
3 null
null 1
How do other DBMS handle this?
Oracle is the same.
Show quoted text
A
Tom Lane wrote:
"Dann Corbit" <DCorbit@connx.com> writes:
Or (perhaps better yet, violating trichotomy) ...
If <Some_column> has a null numeric value, then ALL of the following are
FALSE for that case:Some_column < 0
Some_column > 0
Some_column = 0 Some_column <> 0 // This is the one that many find
surprising
Some_column <= 0
Some_column >= 0It's worse than that: the above do *not* yield FALSE, they yield NULL.
Which does act like FALSE in a simple WHERE clause, but there are other
cases (like CHECK clauses) where it doesn't. "x NOT IN (SELECT ...)"
is a case that newbies routinely get bitten by.Even at that, I think that being able to insert more than one null value
into a unique index should be considered as a bug (or diagnosed as an
error).Direct your complaints to the ISO SQL standards committee.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Hi,
Am Mittwoch, den 19.01.2005, 15:02 -0800 schrieb J. Greenlees:
Roman Neuhauser wrote:
# alex@meerkatsoft.com / 2005-01-20 01:35:32 +1100:
i have a unique index on a table over multiple columns. If now one of
the records has a null value in one of the indexed columns i can insert
the same record multiple times.Is this a problem within postgres or expected?
In SQL, NULL means "unknown value". How could you assert that two
NULLs are equal?which doesn't make mathematical sense.
mathwise null is an empty result.
so setting the logic up using the math logic, null values are always equal.
What kind of mathematics you are speaking?
For example you have "infinity" where infinity is never
equal to infinity.
Same with null. Which is "unknown" or "undefined"
So if x is undefined and y is undefined you cannot
assume x=y - because if you assume this, then
they would not be undefined anymore.
q.e.d.
Regards
Tino
On Thu, 20 Jan 2005 15:20:26 +1100, Alex <alex@meerkatsoft.com> wrote:
I actually just wanted to know if there is a way around this problem.
Obviously it is implemented that way for whatever reason.
Well, if you really need it, partial indexes are your friends! :)
For clarity, let's say you have:
CREATE TABLE foo (
a int,
b int,
c int,
);
And an INDEX:
CREATE UNIQUE INDEX foo_abc_index ON foo (a,b,c);
Now, you want to make sure a and b are UNIQUE, when c is null; just do:
CREATE UNIQUE INDEX foo_abN_index ON foo (a,b) WHERE c IS NULL;
Or even, to make b UNIQUE when a and c are null:
CREATE UNIQUE INDEX foo_NbN_index ON foo (b) WHERE a IS NULL AND c IS NULL;
You need to create such partial indexes for each set of columns
you want to be unique-with-null.
Don't worry about "index bloat". These additional indexes will be used
only when your main (foo_abc_index) is not used, so there won't be
any duplicate data in them.
Isn't PostgreSQL great? :)
Regards,
Dawid