AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards
Yes, column = NULL should *never* return true according to the spec (it
should always return NULL in fact as stated). The reason for breaking
with the spec is AFAIK to work with broken microsoft clients that seem to
think that =NULL is a meaningful test and generate queries using that.Microsoft Access is the guilty party, IIRC. I recently tried to stir up
some interest in changing this behavior back to the standard, but
apparently there are still too many people using broken versions of
Access.
Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).
If not it would simply be an extension to the standard with the defined
behavior of beeing identical to "column is null".
Andreas
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).
I don't have the standard handy, but I do have Joe Celko's book, "Data
& Databases: Concepts in Practice". He says (in section 8.2, under
the heading "Multivalued Logic"):
A NULL cannot be compared to another NULL or to a value
with what Dr. Codd called a theta operator and what
programmers call a comparison operator (equal, not equal,
less than, greater than, and so forth). This results in
a three-valued logic, which has an UNKNOWN in addition
to TRUE and FALSE. [...] UNKNOWN is a logical value and
not the same as a NULL, which is a data value. That is
why you have to say X IS [NOT] NULL in SQL and not use
X = NULL instead. Theta operators are expressions of the
form X <comp op> Y; when X or Y or both are NULL, theta
operators will return an UNKNOWN and not a NULL.
He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz". Further, he says:
SQL-92 added a new predicate of the form
<search condition> IS [NOT] TRUE | FALSE | UNKNOWN
which will let you map any combination of three-valued
logic to the two Boolean values.
A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation. It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison. Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows! This means that the theta
operators are not all treated the same way, which is surely wrong!
It seems to me that the idea of NULL as an unkown data value and
UNKNOWN as the corresponding truth value, combined with the rules for
propagation of NULL in mathematical operations, of UNKNOWN in truth
operations, and from NULL to UNKNOWN by theta operators, is a very
clean, intuitive way of handling these issues. It feels right! :-)
-tih
--
The basic difference is this: hackers build things, crackers break them.
Zeugswetter Andreas SB writes:
Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92
I think you're right.
(e.g. Informix interprets the NULL as column name in this context and
errs out).
NULL is a reserved word, so this would be wrong.
--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
On 7 Jun 2001, Tom Ivar Helbekkmo wrote:
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz". Further, he says:SQL-92 added a new predicate of the form
<search condition> IS [NOT] TRUE | FALSE | UNKNOWN
which will let you map any combination of three-valued
logic to the two Boolean values.A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation. It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison. Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows! This means that the theta
operators are not all treated the same way, which is surely wrong!
That's the nature of the hack we're talking about. It's a grammar level
hack to turn a specific sequence of tokens (= NULL) into IS NULL due
to a client's generated queries. If you're comparing something other
than the constant NULL, it should do what is expected (ie, a comparison
between a NULL in a table or even CAST(NULL as INT4) does the "right"
thing).
I think adding IS UNKNOWN would probably be trivial (I think the code is
basically there in IS NULL.)
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
That's the nature of the hack we're talking about. It's a grammar
level hack to turn a specific sequence of tokens (= NULL) into IS
NULL due to a client's generated queries.
Aha! Sorry -- I jumped in late in the discussion without checking
back to see how it started...
OK, I've already said that I like the cleanliness and orthogonality of
NULL is a missing data value, UNKNOWN is a missing truth value, both
propagate in expressions, comparisons with NULL generate UNKNOWN, and
you can use the special comparisons IS [NOT] NULL and IS [NOT] UNKNOWN
to get plain, two-valued Boolean truth values out of them.
The Microsoft compatibility hack is ugly, and should be either a)
removed, b) expanded to include the other comparison operators and
documented as a PostgreSQL proprietary extension, or c) made into a
special feature that's turned on at will by a SET command. I would
applaud a), approve of c), and be dismayed by b).
I think adding IS UNKNOWN would probably be trivial (I think the
code is basically there in IS NULL.)
But if it's implemented, shouldn't the code also differentiate between
UNKNOWN and NULL, by not (as now) using the latter to represent the
former? Or do I misunderstand how it's handled now?
-tih
--
The basic difference is this: hackers build things, crackers break them.
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
Actually I am not sure whether the column = NULL syntax is even
defined or allowed in SQL92
I've just checked, by reading the relevant paragraphs and studying the
BNF, and the standard says that any comparison of the form X <comp op>
Y is unknown if X or Y (or both) is NULL, including the case where
NULL is given as an explicit constant. So, SQL92 clearly demands that
"column = NULL" is UNKNOWN, never TRUE or FALSE.
-tih
--
The basic difference is this: hackers build things, crackers break them.
On 7 Jun 2001, Tom Ivar Helbekkmo wrote:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
That's the nature of the hack we're talking about. It's a grammar
level hack to turn a specific sequence of tokens (= NULL) into IS
NULL due to a client's generated queries.Aha! Sorry -- I jumped in late in the discussion without checking
back to see how it started...OK, I've already said that I like the cleanliness and orthogonality of
NULL is a missing data value, UNKNOWN is a missing truth value, both
propagate in expressions, comparisons with NULL generate UNKNOWN, and
you can use the special comparisons IS [NOT] NULL and IS [NOT] UNKNOWN
to get plain, two-valued Boolean truth values out of them.The Microsoft compatibility hack is ugly, and should be either a)
removed, b) expanded to include the other comparison operators and
documented as a PostgreSQL proprietary extension, or c) made into a
special feature that's turned on at will by a SET command. I would
applaud a), approve of c), and be dismayed by b).
c is the most likely thing to happen probably.
I think adding IS UNKNOWN would probably be trivial (I think the
code is basically there in IS NULL.)But if it's implemented, shouldn't the code also differentiate between
UNKNOWN and NULL, by not (as now) using the latter to represent the
former? Or do I misunderstand how it's handled now?
Within what's there (using null as unknown), the two tests are nearly
identical and would probably be just a grammar change. Creating a
separate unknown would be more difficult, and I'm not sure it's necessary
to make the distinction. NULL is an unknown value, I'm not sure that
you'd need a separate unknown value specifically for booleans.
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).
Strictly speaking, SQL92 would require you to write
foo = CAST (NULL AS type-of-foo)
However, we allow unadorned NULL in other contexts as a shorthand for
the CAST notation, so it's inconsistent of us to say that in this
context it means something different.
The real problem with accepting this Microsoftism is that it's a trap
for unwary programmers. Case 1: someone who's not studied SQL in detail
might experiment with examples involving "foo = NULL" and jump to
reasonable but entirely incorrect conclusions about how comparisons
involving NULL operate. Case 2: someone who *has* studied SQL, and is
also aware that we accept unadorned NULLs, will also draw the wrong
conclusions about what this construct will do. Bottom line: this kluge
surprises everyone except those who already know it exists. I don't
like systems that surprise their users in inconsistent ways.
regards, tom lane
Tom Ivar Helbekkmo <tih@kpnQwest.no> quotes:
... This results in
a three-valued logic, which has an UNKNOWN in addition
to TRUE and FALSE. [...] UNKNOWN is a logical value and
not the same as a NULL, which is a data value.
SQL92 is not very clear about whether NULL and UNKNOWN are distinct,
but it is worth noticing that their truth tables for comparison
operators, and/or/not, etc, only mention unknown --- never null ---
as a possible value of a boolean condition. SQL99 clarifies the
intent:
The data type boolean comprises the distinct truth values true and
false. Unless prohibited by a NOT NULL constraint, the boolean
data type also supports the unknown truth value as the null value.
This specification does not make a distinction between the null
value of the boolean data type and the unknown truth value that is
the result of an SQL <predicate>, <search condition>, or <boolean
value expression>; they may be used interchangeably to mean exactly
the same thing.
Which in fact is what Postgres does.
A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation.
We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?
It's also worth noticing that our implementation of IS NULL isn't really
up to speed: the spec allows the argument to be a row value constructor,
not just a scalar. But we mostly don't have support for row-value-
constructor expressions anyway (it's not an Entry SQL feature).
regards, tom lane
We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?
I'd like to finish up the has_table_privilege function over the next week or
so and then take this on. Can you point me in a direction to start looking?
Thanks,
Joe
The real problem with accepting this Microsoftism is that it's a trap
for unwary programmers. Case 1: someone who's not studied SQL in detail
might experiment with examples involving "foo = NULL" and jump to
reasonable but entirely incorrect conclusions about how comparisons
involving NULL operate. Case 2: someone who *has* studied SQL, and is
also aware that we accept unadorned NULLs, will also draw the wrong
conclusions about what this construct will do. Bottom line: this kluge
surprises everyone except those who already know it exists. I don't
like systems that surprise their users in inconsistent ways.
These were all points that were brought up and discussed when the hack
was implemented. At the time, the trade between implementing a construct
which is *not allowed* in SQL9x vs enabling the M$Access community to
migrate to PostgreSQL was hashed over and a consensus was reached that
the benefits to allowing it outweighed the drawbacks.
I was of the initial opinion that we should not support M$ pathelogical
non-standard constructs (jeez, they should know better, and probably do,
so it is likely a pathetic attempt to lock in users). But since the
construct is not allowed (or useless), why would anyone feel they need
to use it?
Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists." We have had a strong consensus that
SQL9x standard constructs should be the norm for PostgreSQL, so no need
to rehash that. The issue boils down to, as it did when it first came
up, whether we will make it easier for M$Access users to start migrating
to PostgreSQL. If newer versions of Access emit standard constructs,
then it would be even easier to say that we should jettison the
construct. But if not, istm that we are killing a usability feature on
principle, not from need, and it may be premature to do that.
Using a "SET xxx" switch is not a step in the right direction istm since
it requires an extra PostgreSQL-specific command to get things working
with Access. We aren't teaching SQL, we are trying to let people get to
their data.
btw, I *was* suprised to see the "IS UNKNOWN" construct. It's been
lurking in my reference books for years. afaict it is very uncommonly
used, since most RDBMSes accept IS NULL as an equivalent. But is is
trivial to add to the grammar.
- Thomas
"Joe Conway" <joe@conway-family.com> writes:
We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?
I'd like to finish up the has_table_privilege function over the next week or
so and then take this on. Can you point me in a direction to start looking?
The way things currently work is that gram.y translates "x IS TRUE" etc
to "x = true" etc. This is wrong because it does the wrong thing for
null input. Another objection is that it's impossible for ruleutils.c
to reverse-list the expression tree in its original form.
IS [NOT] NULL is handled a little differently: gram.y generates a
specialized Expr node, which parse_expr.c translates to a function call
on the specialized functions nullvalue() and nonnullvalue()
respectively. I don't much care for this implementation either, again
partly because ruleutils.c has to be uglified to deal with it, but
partly because the optimizer can't cheaply recognize IS NULL tests
either.
I'd like to see all eight of these guys translated into a specialized
kind of expression node, called perhaps BooleanTest. Actually, it'd
probably be wise to keep IS NULL separate from the six boolean tests,
with an eye to the future when it will need to support nonscalar
arguments. So maybe BooleanTest and NullTest node types, each with a
field showing exactly which test is wanted.
Adding a new expression node type is a straightforward but rather
tedious exercise in teaching some dozens of places what to do with it.
A grep for existing expression node types, such as CaseExpr or
FieldSelect or RelabelType, will give you a good idea what needs to be
done.
regards, tom lane
The way things currently work is that gram.y translates "x IS TRUE" etc
to "x = true" etc. This is wrong because it does the wrong thing for
null input. Another objection is that it's impossible for ruleutils.c
to reverse-list the expression tree in its original form.
fyi, in case it helps: we used to have gram.y translate these into
function calls, rather than the operator expression. But that precluded
the optimizer from ever having a shot at optimizing out "const = const"
kinds of expressions and other fluff.
If we go to a specialized node in the parse tree, then the optimizer
could be taught to handle that, which is better than the original
straight function call which would have masked things too much.
- Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes:
But since the construct is not allowed (or useless), why would
anyone feel they need to use it?
Because it isn't entirely useless, actually. I agree that no
programmer in his right mind would write, by hand, a comparison
involving NULL, knowing that the truth value of that comparison is
required by the standard to be UNKNOWN (i.e. NULL). However, I'm
looking at using machine generated SQL code (generated on the fly in
an interactive application) to implement a dynamically adapting set of
tables, rules/triggers and their supporting stored procedures, and
it's just a matter of time before the first "= NULL" happens to show
up in code generated like that. I'd like it to behave according to
the standard when that situation occurs, and the standard says that
any comparison with NULL, even "NULL = NULL", is UNKNOWN.
-tih
--
The basic difference is this: hackers build things, crackers break them.
Thomas Lockhart <lockhart@fourpalms.org> writes:
Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists."
How can you argue that, when the topic comes up again every couple of
months?
regards, tom lane
Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists."How can you argue that, when the topic comes up again every couple of
months?
I'm not looking for an argument. But the statement is not factually
true: "suprises everyone" (most folks don't notice, and don't care much)
and "every couple of months" (??) stray far enough from facts that we
should get back on topic. Whatever the facts, the messages that we do
*not* get are just as significant: a relatively large portion of
feedback from users of both Access and PostgreSQL at the time the
"feature" was added indicated that it was a significant stumbling block
for those users, and those messages will start up anew without adequate
planning and implementation.
Since back then, we have some additional clarification that it occurs
only for users of Access/Forms, and that others are worried that it will
lead to difficulties for others under different circumstances (please
remember, or note if it is too long ago, that at the time I argued
against adding the "feature", but tried to listen to those who would
find it useful). I'm not ignoring those worries, but in the battle
between purity and usefulness we shouldn't always line up with the
strongest or loudest voice(s) that day, but try to respect and keep in
mind the others who have contributed to the discussion in the past.
That said, the issues should be broken down into managable chunks, but
imho forcing the last step (removal of the "= NULL" accomodation) first
is premature. How about phasing this so that we can accomodate the
long-standing issue of M$ interoperability (via ODBC improvements to
make it transparent) before ripping out the current workaround?
From Andreas' comments, it seems that for his application he would like
a different behavior, but frankly I'm not certain why the current
behavior would be detrimental in the use case he mentioned. If SQL92
requires that any query with "= NULL" be rejected as illegal (my books
aren't nearby at the moment), he might still want to have code at the
application level for some graceful handling of illegal values.
- Thomas
IS [NOT] NULL is handled a little differently: gram.y generates a
specialized Expr node, which parse_expr.c translates to a function call
on the specialized functions nullvalue() and nonnullvalue()
respectively. I don't much care for this implementation either, again
partly because ruleutils.c has to be uglified to deal with it, but
partly because the optimizer can't cheaply recognize IS NULL tests
either.I'd like to see all eight of these guys translated into a specialized
kind of expression node, called perhaps BooleanTest. Actually, it'd
probably be wise to keep IS NULL separate from the six boolean tests,
with an eye to the future when it will need to support nonscalar
arguments. So maybe BooleanTest and NullTest node types, each with a
field showing exactly which test is wanted.
Attached is a patch for a new NullTest node type for review and comment.
Since it didn't seem like there was consensus regarding removal of the "a =
null" conversion to "a is null" behavior, I left it in. It is worth
mentioning, however, that neither Oracle 8.1.6 or MSSQL 7 seem to support
this -- see below:
Oracle:
****************************************
SQL> select f1,f2 from foo where f2 = null;
no rows selected
MSSQL 7
****************************************
select f1,f2 from foo where f2 = null
f1 f2
----------- --------------------------------------------------
(0 row(s) affected)
PostgreSQL
****************************************
test=# select f1,f2 from foo where f2 = null;
f1 | f2
----+----
1 |
4 |
(2 rows)
In all 3 cases table foo has 4 rows, 2 of which have null values for f2.
Based on this, should support for the converting "a = null" to "a is null"
be dropped?
I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?
Thanks,
-- Joe
Attachments:
nulltest01.diffapplication/octet-stream; name=nulltest01.diffDownload+233-9
"Joe Conway" <joseph.conway@home.com> writes:
Attached is a patch for a new NullTest node type for review and comment.
I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...
Based on this, should support for the converting "a = null" to "a is null"
be dropped?
My opinion on that is already on record ;-)
I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?
Curious; I'd have said that that is clearly within the spec. Can anyone
check it on some other engines?
regards, tom lane
I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...
OK -- I'll get started on BooleanTest in the next day or two.
-- Joe
"Joe Conway" <joseph.conway@home.com> writes:
I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?
I dug into this further and discovered that indeed it is not SQL92
... but it is SQL99. Amazingly enough, SQL92 doesn't allow boolean
expressions as a possible type of general expression:
<value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>
It only allows them as <search condition>s, which is to say WHERE,
HAVING, CASE WHEN, CHECK, and one or two other places.
But SQL99 gets it right:
<value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>
| <boolean value expression>
| <user-defined type value expression>
| <row value expression>
| <reference value expression>
| <collection value expression>
Looks like we're ahead of the curve here...
regards, tom lane