AW: Re: [SQL] behavior of &#39; = NULL&#39; vs. MySQL vs. Stand ards

tih@kpnQwest.no

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).

I don't have the standard handy, but I do have Joe Celko's book, "Data
& Databases: Concepts in Practice". He says (in section 8.2, under
the heading "Multivalued Logic"):

A NULL cannot be compared to another NULL or to a value
with what Dr. Codd called a theta operator and what
programmers call a comparison operator (equal, not equal,
less than, greater than, and so forth). This results in
a three-valued logic, which has an UNKNOWN in addition
to TRUE and FALSE. [...] UNKNOWN is a logical value and
not the same as a NULL, which is a data value. That is
why you have to say X IS [NOT] NULL in SQL and not use
X = NULL instead. Theta operators are expressions of the
form X <comp op> Y; when X or Y or both are NULL, theta
operators will return an UNKNOWN and not a NULL.

He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz". Further, he says:

SQL-92 added a new predicate of the form

<search condition> IS [NOT] TRUE | FALSE | UNKNOWN

which will let you map any combination of three-valued
logic to the two Boolean values.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation. It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison. Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows! This means that the theta
operators are not all treated the same way, which is surely wrong!

It seems to me that the idea of NULL as an unkown data value and
UNKNOWN as the corresponding truth value, combined with the rules for
propagation of NULL in mathematical operations, of UNKNOWN in truth
operations, and from NULL to UNKNOWN by theta operators, is a very
clean, intuitive way of handling these issues. It feels right! :-)

-tih
--
The basic difference is this: hackers build things, crackers break them.

Peter Eisentraut

peter_e@gmx.net

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92

I think you're right.

(e.g. Informix interprets the NULL as column name in this context and
errs out).

NULL is a reserved word, so this would be wrong.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 24 years ago

In reply to: Tom Ivar Helbekkmo (#2)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

On 7 Jun 2001, Tom Ivar Helbekkmo wrote:

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).

He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz". Further, he says:

SQL-92 added a new predicate of the form

<search condition> IS [NOT] TRUE | FALSE | UNKNOWN

which will let you map any combination of three-valued
logic to the two Boolean values.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation. It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison. Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows! This means that the theta
operators are not all treated the same way, which is surely wrong!

That's the nature of the hack we're talking about. It's a grammar level
hack to turn a specific sequence of tokens (= NULL) into IS NULL due
to a client's generated queries. If you're comparing something other
than the constant NULL, it should do what is expected (ie, a comparison
between a NULL in a table or even CAST(NULL as INT4) does the "right"
thing).

I think adding IS UNKNOWN would probably be trivial (I think the code is
basically there in IS NULL.)

tih@kpnQwest.no

over 24 years ago

In reply to: Stephan Szabo (#4)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

That's the nature of the hack we're talking about. It's a grammar
level hack to turn a specific sequence of tokens (= NULL) into IS
NULL due to a client's generated queries.

Aha! Sorry -- I jumped in late in the discussion without checking
back to see how it started...

OK, I've already said that I like the cleanliness and orthogonality of
NULL is a missing data value, UNKNOWN is a missing truth value, both
propagate in expressions, comparisons with NULL generate UNKNOWN, and
you can use the special comparisons IS [NOT] NULL and IS [NOT] UNKNOWN
to get plain, two-valued Boolean truth values out of them.

The Microsoft compatibility hack is ugly, and should be either a)
removed, b) expanded to include the other comparison operators and
documented as a PostgreSQL proprietary extension, or c) made into a
special feature that's turned on at will by a SET command. I would
applaud a), approve of c), and be dismayed by b).

I think adding IS UNKNOWN would probably be trivial (I think the
code is basically there in IS NULL.)

But if it's implemented, shouldn't the code also differentiate between
UNKNOWN and NULL, by not (as now) using the latter to represent the
former? Or do I misunderstand how it's handled now?

-tih
--
The basic difference is this: hackers build things, crackers break them.

tih@kpnQwest.no

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even
defined or allowed in SQL92

I've just checked, by reading the relevant paragraphs and studying the
BNF, and the standard says that any comparison of the form X <comp op>
Y is unknown if X or Y (or both) is NULL, including the case where
NULL is given as an explicit constant. So, SQL92 clearly demands that
"column = NULL" is UNKNOWN, never TRUE or FALSE.

-tih
--
The basic difference is this: hackers build things, crackers break them.

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 24 years ago

In reply to: Tom Ivar Helbekkmo (#5)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

On 7 Jun 2001, Tom Ivar Helbekkmo wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

That's the nature of the hack we're talking about. It's a grammar
level hack to turn a specific sequence of tokens (= NULL) into IS
NULL due to a client's generated queries.

Aha! Sorry -- I jumped in late in the discussion without checking
back to see how it started...

OK, I've already said that I like the cleanliness and orthogonality of
NULL is a missing data value, UNKNOWN is a missing truth value, both
propagate in expressions, comparisons with NULL generate UNKNOWN, and
you can use the special comparisons IS [NOT] NULL and IS [NOT] UNKNOWN
to get plain, two-valued Boolean truth values out of them.

The Microsoft compatibility hack is ugly, and should be either a)
removed, b) expanded to include the other comparison operators and
documented as a PostgreSQL proprietary extension, or c) made into a
special feature that's turned on at will by a SET command. I would
applaud a), approve of c), and be dismayed by b).

c is the most likely thing to happen probably.

I think adding IS UNKNOWN would probably be trivial (I think the
code is basically there in IS NULL.)

But if it's implemented, shouldn't the code also differentiate between
UNKNOWN and NULL, by not (as now) using the latter to represent the
former? Or do I misunderstand how it's handled now?

Within what's there (using null as unknown), the two tests are nearly
identical and would probably be just a grammar change. Creating a
separate unknown would be more difficult, and I'm not sure it's necessary
to make the distinction. NULL is an unknown value, I'm not sure that
you'd need a separate unknown value specifically for booleans.

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).

Strictly speaking, SQL92 would require you to write
foo = CAST (NULL AS type-of-foo)
However, we allow unadorned NULL in other contexts as a shorthand for
the CAST notation, so it's inconsistent of us to say that in this
context it means something different.

The real problem with accepting this Microsoftism is that it's a trap
for unwary programmers. Case 1: someone who's not studied SQL in detail
might experiment with examples involving "foo = NULL" and jump to
reasonable but entirely incorrect conclusions about how comparisons
involving NULL operate. Case 2: someone who *has* studied SQL, and is
also aware that we accept unadorned NULLs, will also draw the wrong
conclusions about what this construct will do. Bottom line: this kluge
surprises everyone except those who already know it exists. I don't
like systems that surprise their users in inconsistent ways.

regards, tom lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Tom Ivar Helbekkmo (#2)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Tom Ivar Helbekkmo <tih@kpnQwest.no> quotes:

... This results in
a three-valued logic, which has an UNKNOWN in addition
to TRUE and FALSE. [...] UNKNOWN is a logical value and
not the same as a NULL, which is a data value.

SQL92 is not very clear about whether NULL and UNKNOWN are distinct,
but it is worth noticing that their truth tables for comparison
operators, and/or/not, etc, only mention unknown --- never null ---
as a possible value of a boolean condition. SQL99 clarifies the
intent:

The data type boolean comprises the distinct truth values true and
false. Unless prohibited by a NOT NULL constraint, the boolean
data type also supports the unknown truth value as the null value.
This specification does not make a distinction between the null
value of the boolean data type and the unknown truth value that is
the result of an SQL <predicate>, <search condition>, or <boolean
value expression>; they may be used interchangeably to mean exactly
the same thing.

Which in fact is what Postgres does.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation.

We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?

It's also worth noticing that our implementation of IS NULL isn't really
up to speed: the spec allows the argument to be a row value constructor,
not just a scalar. But we mostly don't have support for row-value-
constructor expressions anyway (it's not an Entry SQL feature).

regards, tom lane

#10

joe@conway-family.com

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?

I'd like to finish up the has_table_privilege function over the next week or
so and then take this on. Can you point me in a direction to start looking?

Thanks,

Joe

#11

Thomas Lockhart

lockhart@fourpalms.org

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

The real problem with accepting this Microsoftism is that it's a trap
for unwary programmers. Case 1: someone who's not studied SQL in detail
might experiment with examples involving "foo = NULL" and jump to
reasonable but entirely incorrect conclusions about how comparisons
involving NULL operate. Case 2: someone who *has* studied SQL, and is
also aware that we accept unadorned NULLs, will also draw the wrong
conclusions about what this construct will do. Bottom line: this kluge
surprises everyone except those who already know it exists. I don't
like systems that surprise their users in inconsistent ways.

These were all points that were brought up and discussed when the hack
was implemented. At the time, the trade between implementing a construct
which is *not allowed* in SQL9x vs enabling the M$Access community to
migrate to PostgreSQL was hashed over and a consensus was reached that
the benefits to allowing it outweighed the drawbacks.

I was of the initial opinion that we should not support M$ pathelogical
non-standard constructs (jeez, they should know better, and probably do,
so it is likely a pathetic attempt to lock in users). But since the
construct is not allowed (or useless), why would anyone feel they need
to use it?

Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists." We have had a strong consensus that
SQL9x standard constructs should be the norm for PostgreSQL, so no need
to rehash that. The issue boils down to, as it did when it first came
up, whether we will make it easier for M$Access users to start migrating
to PostgreSQL. If newer versions of Access emit standard constructs,
then it would be even easier to say that we should jettison the
construct. But if not, istm that we are killing a usability feature on
principle, not from need, and it may be premature to do that.

Using a "SET xxx" switch is not a step in the right direction istm since
it requires an extra PostgreSQL-specific command to get things working
with Access. We aren't teaching SQL, we are trying to let people get to
their data.

btw, I *was* suprised to see the "IS UNKNOWN" construct. It's been
lurking in my reference books for years. afaict it is very uncommonly
used, since most RDBMSes accept IS NULL as an equivalent. But is is
trivial to add to the grammar.

- Thomas

#12

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Joe Conway (#10)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joe@conway-family.com> writes:

We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?

I'd like to finish up the has_table_privilege function over the next week or
so and then take this on. Can you point me in a direction to start looking?

The way things currently work is that gram.y translates "x IS TRUE" etc
to "x = true" etc. This is wrong because it does the wrong thing for
null input. Another objection is that it's impossible for ruleutils.c
to reverse-list the expression tree in its original form.

IS [NOT] NULL is handled a little differently: gram.y generates a
specialized Expr node, which parse_expr.c translates to a function call
on the specialized functions nullvalue() and nonnullvalue()
respectively. I don't much care for this implementation either, again
partly because ruleutils.c has to be uglified to deal with it, but
partly because the optimizer can't cheaply recognize IS NULL tests
either.

I'd like to see all eight of these guys translated into a specialized
kind of expression node, called perhaps BooleanTest. Actually, it'd
probably be wise to keep IS NULL separate from the six boolean tests,
with an eye to the future when it will need to support nonscalar
arguments. So maybe BooleanTest and NullTest node types, each with a
field showing exactly which test is wanted.

Adding a new expression node type is a straightforward but rather
tedious exercise in teaching some dozens of places what to do with it.
A grep for existing expression node types, such as CaseExpr or
FieldSelect or RelabelType, will give you a good idea what needs to be
done.

regards, tom lane

#13

Thomas Lockhart

lockhart@fourpalms.org

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

The way things currently work is that gram.y translates "x IS TRUE" etc
to "x = true" etc. This is wrong because it does the wrong thing for
null input. Another objection is that it's impossible for ruleutils.c
to reverse-list the expression tree in its original form.

fyi, in case it helps: we used to have gram.y translate these into
function calls, rather than the operator expression. But that precluded
the optimizer from ever having a shot at optimizing out "const = const"
kinds of expressions and other fluff.

If we go to a specialized node in the parse tree, then the optimizer
could be taught to handle that, which is better than the original
straight function call which would have masked things too much.

- Thomas

#14

tih@kpnQwest.no

over 24 years ago

In reply to: Thomas Lockhart (#11)

Re: Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Thomas Lockhart <lockhart@fourpalms.org> writes:

But since the construct is not allowed (or useless), why would
anyone feel they need to use it?

Because it isn't entirely useless, actually. I agree that no
programmer in his right mind would write, by hand, a comparison
involving NULL, knowing that the truth value of that comparison is
required by the standard to be UNKNOWN (i.e. NULL). However, I'm
looking at using machine generated SQL code (generated on the fly in
an interactive application) to implement a dynamically adapting set of
tables, rules/triggers and their supporting stored procedures, and
it's just a matter of time before the first "= NULL" happens to show
up in code generated like that. I'd like it to behave according to
the standard when that situation occurs, and the standard says that
any comparison with NULL, even "NULL = NULL", is UNKNOWN.

-tih
--
The basic difference is this: hackers build things, crackers break them.

#15

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Thomas Lockhart (#11)

Re: Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Thomas Lockhart <lockhart@fourpalms.org> writes:

Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists."

How can you argue that, when the topic comes up again every couple of
months?

regards, tom lane

#16

Thomas Lockhart

lockhart@alumni.caltech.edu

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists."

How can you argue that, when the topic comes up again every couple of
months?

I'm not looking for an argument. But the statement is not factually
true: "suprises everyone" (most folks don't notice, and don't care much)
and "every couple of months" (??) stray far enough from facts that we
should get back on topic. Whatever the facts, the messages that we do
*not* get are just as significant: a relatively large portion of
feedback from users of both Access and PostgreSQL at the time the
"feature" was added indicated that it was a significant stumbling block
for those users, and those messages will start up anew without adequate
planning and implementation.

Since back then, we have some additional clarification that it occurs
only for users of Access/Forms, and that others are worried that it will
lead to difficulties for others under different circumstances (please
remember, or note if it is too long ago, that at the time I argued
against adding the "feature", but tried to listen to those who would
find it useful). I'm not ignoring those worries, but in the battle
between purity and usefulness we shouldn't always line up with the
strongest or loudest voice(s) that day, but try to respect and keep in
mind the others who have contributed to the discussion in the past.

That said, the issues should be broken down into managable chunks, but
imho forcing the last step (removal of the "= NULL" accomodation) first
is premature. How about phasing this so that we can accomodate the
long-standing issue of M$ interoperability (via ODBC improvements to
make it transparent) before ripping out the current workaround?

From Andreas' comments, it seems that for his application he would like
a different behavior, but frankly I'm not certain why the current
behavior would be detrimental in the use case he mentioned. If SQL92
requires that any query with "= NULL" be rejected as illegal (my books
aren't nearby at the moment), he might still want to have code at the
application level for some graceful handling of illegal values.

- Thomas

#17

joseph.conway@home.com

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

1 attachment(s)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

IS [NOT] NULL is handled a little differently: gram.y generates a
specialized Expr node, which parse_expr.c translates to a function call
on the specialized functions nullvalue() and nonnullvalue()
respectively. I don't much care for this implementation either, again
partly because ruleutils.c has to be uglified to deal with it, but
partly because the optimizer can't cheaply recognize IS NULL tests
either.

I'd like to see all eight of these guys translated into a specialized
kind of expression node, called perhaps BooleanTest. Actually, it'd
probably be wise to keep IS NULL separate from the six boolean tests,
with an eye to the future when it will need to support nonscalar
arguments. So maybe BooleanTest and NullTest node types, each with a
field showing exactly which test is wanted.

Attached is a patch for a new NullTest node type for review and comment.
Since it didn't seem like there was consensus regarding removal of the "a =
null" conversion to "a is null" behavior, I left it in. It is worth
mentioning, however, that neither Oracle 8.1.6 or MSSQL 7 seem to support
this -- see below:

Oracle:
****************************************
SQL> select f1,f2 from foo where f2 = null;

no rows selected

MSSQL 7
****************************************
select f1,f2 from foo where f2 = null
f1 f2
----------- --------------------------------------------------

(0 row(s) affected)

PostgreSQL
****************************************
test=# select f1,f2 from foo where f2 = null;
f1 | f2
----+----
1 |
4 |
(2 rows)

In all 3 cases table foo has 4 rows, 2 of which have null values for f2.
Based on this, should support for the converting "a = null" to "a is null"
be dropped?

I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?

Thanks,

-- Joe

Attachments:

nulltest01.diffapplication/octet-stream; name=nulltest01.diffDownload

diff -Naur pgsql.virg/src/backend/executor/execQual.c pgsql.dev/src/backend/executor/execQual.c
--- pgsql.virg/src/backend/executor/execQual.c	Sat Jun 16 18:56:42 2001
+++ pgsql.dev/src/backend/executor/execQual.c	Mon Jun 18 06:15:59 2001
@@ -62,7 +62,8 @@
 static Datum ExecEvalOr(Expr *orExpr, ExprContext *econtext, bool *isNull);
 static Datum ExecEvalCase(CaseExpr *caseExpr, ExprContext *econtext,
 			 bool *isNull, ExprDoneCond *isDone);
-
+static Datum ExecEvalNullTest(NullTest *ntest, ExprContext *econtext,
+			 bool *isNull, ExprDoneCond *isDone);
 
 /*----------
  *	  ExecEvalArrayRef
@@ -1092,6 +1093,46 @@
 }
 
 /* ----------------------------------------------------------------
+ *		ExecEvalNullTest
+ *
+ *		Evaluate a NullTest node.
+ * ----------------------------------------------------------------
+ */
+static Datum
+ExecEvalNullTest(NullTest *ntest,
+					ExprContext *econtext,
+					bool *isNull,
+					ExprDoneCond *isDone)
+{
+	Datum		result;
+
+	result = ExecEvalExpr(ntest->arg, econtext, isNull, isDone);
+	if (*isNull && ntest->nulltesttype == IS_NULL)
+	{
+		*isNull = FALSE;
+		return BoolGetDatum(TRUE);
+	}
+	else if (*isNull && ntest->nulltesttype == IS_NOT_NULL)
+	{
+		*isNull = FALSE;
+		return BoolGetDatum(FALSE);
+	}
+	else if (!*isNull && ntest->nulltesttype == IS_NULL)
+	{
+		return BoolGetDatum(FALSE);
+	}
+	else if (!*isNull && ntest->nulltesttype == IS_NOT_NULL)
+	{
+		return BoolGetDatum(TRUE);
+	}
+	else									/* should never get here */ 
+	{
+		elog(ERROR, "NullTest node: something unexpected happened!");
+		return BoolGetDatum(NULL);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecEvalFieldSelect
  *
  *		Evaluate a FieldSelect node.
@@ -1262,6 +1303,12 @@
 			break;
 		case T_CaseExpr:
 			retDatum = ExecEvalCase((CaseExpr *) expression,
+									econtext,
+									isNull,
+									isDone);
+			break;
+		case T_NullTest:
+			retDatum = ExecEvalNullTest((NullTest *) expression,
 									econtext,
 									isNull,
 									isDone);
diff -Naur pgsql.virg/src/backend/nodes/copyfuncs.c pgsql.dev/src/backend/nodes/copyfuncs.c
--- pgsql.virg/src/backend/nodes/copyfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/copyfuncs.c	Mon Jun 18 05:11:36 2001
@@ -1017,6 +1017,24 @@
 	return newnode;
 }
 
+/* ----------------
+ *		_copyNullTest
+ * ----------------
+ */
+static NullTest *
+_copyNullTest(NullTest *from)
+{
+	NullTest *newnode = makeNode(NullTest);
+
+	/*
+	 * copy remainder of node
+	 */
+	Node_Copy(from, newnode, arg);
+	newnode->nulltesttype = from->nulltesttype;
+
+	return newnode;
+}
+
 static ArrayRef *
 _copyArrayRef(ArrayRef *from)
 {
@@ -2953,6 +2971,9 @@
 			break;
 		case T_CaseWhen:
 			retval = _copyCaseWhen(from);
+			break;
+		case T_NullTest:
+			retval = _copyNullTest(from);
 			break;
 		case T_FkConstraint:
 			retval = _copyFkConstraint(from);
diff -Naur pgsql.virg/src/backend/nodes/equalfuncs.c pgsql.dev/src/backend/nodes/equalfuncs.c
--- pgsql.virg/src/backend/nodes/equalfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/equalfuncs.c	Mon Jun 18 04:57:58 2001
@@ -1712,6 +1712,16 @@
 	return true;
 }
 
+static bool
+_equalNullTest(NullTest *a, NullTest *b)
+{
+	if (!equal(a->arg, b->arg))
+		return false;
+	if (a->nulltesttype != b->nulltesttype)
+		return false;
+	return true;
+}
+
 /*
  * Stuff from pg_list.h
  */
@@ -2119,6 +2129,9 @@
 			break;
 		case T_CaseWhen:
 			retval = _equalCaseWhen(a, b);
+			break;
+		case T_NullTest:
+			retval = _equalNullTest(a, b);
 			break;
 		case T_FkConstraint:
 			retval = _equalFkConstraint(a, b);
diff -Naur pgsql.virg/src/backend/nodes/outfuncs.c pgsql.dev/src/backend/nodes/outfuncs.c
--- pgsql.virg/src/backend/nodes/outfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/outfuncs.c	Mon Jun 18 04:55:49 2001
@@ -1403,6 +1403,19 @@
 }
 
 /*
+ *	NullTest
+ */
+static void
+_outNullTest(StringInfo str, NullTest *node)
+{
+	appendStringInfo(str, " NULLTEST :arg ");
+	_outNode(str, node->arg);
+
+	appendStringInfo(str, " :nulltesttype %d ",
+				   node->nulltesttype);
+}
+
+/*
  * _outNode -
  *	  converts a Node into ascii string and append it to 'str'
  */
@@ -1639,7 +1652,9 @@
 			case T_CaseWhen:
 				_outCaseWhen(str, obj);
 				break;
-
+			case T_NullTest:
+				_outNullTest(str, obj);
+				break;
 			case T_VariableSetStmt:
 				break;
 			case T_SelectStmt:
diff -Naur pgsql.virg/src/backend/nodes/readfuncs.c pgsql.dev/src/backend/nodes/readfuncs.c
--- pgsql.virg/src/backend/nodes/readfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/readfuncs.c	Mon Jun 18 04:54:06 2001
@@ -860,6 +860,31 @@
 }
 
 /* ----------------
+ *		_readNullTest
+ *
+ *	NullTest is a subclass of Node
+ * ----------------
+ */
+static NullTest *
+_readNullTest(void)
+{
+	NullTest	*local_node;
+	char		*token;
+	int			length;
+
+	local_node = makeNode(NullTest);
+
+	token = pg_strtok(&length); /* eat :arg */
+	local_node->arg = nodeRead(true);	/* now read it */
+
+	token = pg_strtok(&length); /* eat :nulltesttype */
+	token = pg_strtok(&length); /* get nulltesttype */
+	local_node->nulltesttype = (NullTestType) atoi(token);
+
+	return local_node;
+}
+
+/* ----------------
  *		_readVar
  *
  *	Var is a subclass of Expr
@@ -1966,6 +1991,8 @@
 		return_value = _readCaseExpr();
 	else if (length == 4 && strncmp(token, "WHEN", length) == 0)
 		return_value = _readCaseWhen();
+	else if (length == 8 && strncmp(token, "NULLTEST", length) == 0)
+		return_value = _readNullTest();
 	else
 		elog(ERROR, "badly formatted planstring \"%.10s\"...", token);
 
diff -Naur pgsql.virg/src/backend/optimizer/util/clauses.c pgsql.dev/src/backend/optimizer/util/clauses.c
--- pgsql.virg/src/backend/optimizer/util/clauses.c	Sat Jun 16 18:56:46 2001
+++ pgsql.dev/src/backend/optimizer/util/clauses.c	Mon Jun 18 04:50:10 2001
@@ -1580,6 +1580,8 @@
 					return true;
 			}
 			break;
+		case T_NullTest:
+			return walker(((NullTest *) node)->arg, context);
 		case T_SubLink:
 			{
 				SubLink    *sublink = (SubLink *) node;
@@ -1930,6 +1932,16 @@
 				FLATCOPY(newnode, casewhen, CaseWhen);
 				MUTATE(newnode->expr, casewhen->expr, Node *);
 				MUTATE(newnode->result, casewhen->result, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest *ntest = (NullTest *) node;
+				NullTest *newnode;
+
+				FLATCOPY(newnode, ntest, NullTest);
+				MUTATE(newnode->arg, ntest->arg, Node *);
 				return (Node *) newnode;
 			}
 			break;
diff -Naur pgsql.virg/src/backend/parser/gram.y pgsql.dev/src/backend/parser/gram.y
--- pgsql.virg/src/backend/parser/gram.y	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/gram.y	Mon Jun 18 05:25:02 2001
@@ -4434,9 +4434,19 @@
 					 * (like Microsoft's).  Turn these into IS NULL exprs.
 					 */
 					if (exprIsNullConstant($3))
-						$$ = makeA_Expr(ISNULL, NULL, $1, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $1;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else if (exprIsNullConstant($1))
-						$$ = makeA_Expr(ISNULL, NULL, $3, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $3;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else
 						$$ = makeA_Expr(OP, "=", $1, $3);
 				}
@@ -4499,15 +4509,43 @@
 					n->agg_distinct = FALSE;
 					$$ = makeA_Expr(OP, "!~~*", $1, (Node *) n);
 				}
-
+		/* NullTest clause
+		 * Define SQL92-style Null test clause.
+		 * Allow all two forms described in the standard:
+		 *  a IS NULL
+		 *  a IS NOT NULL
+		 * Allow two SQL extensions
+		 *  a ISNULL
+		 *  a NOTNULL
+		 */
 		| a_expr ISNULL
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NULL_P
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr NOTNULL
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NOT NULL_P
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		/* IS TRUE, IS FALSE, etc used to be function calls
 		 *  but let's make them expressions to allow the optimizer
 		 *  a chance to eliminate them if a_expr is a constant string.
diff -Naur pgsql.virg/src/backend/parser/parse_expr.c pgsql.dev/src/backend/parser/parse_expr.c
--- pgsql.virg/src/backend/parser/parse_expr.c	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/parse_expr.c	Mon Jun 18 05:10:49 2001
@@ -510,6 +510,16 @@
 				break;
 			}
 
+		case T_NullTest:
+			{
+				NullTest   *n = (NullTest *) expr;
+
+				n->arg = transformExpr(pstate, n->arg, precedence);
+
+				result = (Node *) expr;
+				break;
+			}
+
 			/*
 			 * Quietly accept node types that may be presented when we are
 			 * called on an already-transformed tree.
@@ -668,6 +678,9 @@
 			break;
 		case T_CaseWhen:
 			type = exprType(((CaseWhen *) expr)->result);
+			break;
+		case T_NullTest:
+			type = BOOLOID;
 			break;
 		case T_Ident:
 			/* is this right? */
diff -Naur pgsql.virg/src/backend/utils/adt/ruleutils.c pgsql.dev/src/backend/utils/adt/ruleutils.c
--- pgsql.virg/src/backend/utils/adt/ruleutils.c	Sat Jun 16 18:56:48 2001
+++ pgsql.dev/src/backend/utils/adt/ruleutils.c	Sat Jun 16 23:36:15 2001
@@ -1891,6 +1891,21 @@
 			}
 			break;
 
+		case T_NullTest:
+			{
+				NullTest		*ntest = (NullTest *) node;
+
+				get_rule_expr(ntest->arg, context);
+				if (ntest->nulltesttype == IS_NULL)
+					appendStringInfo(buf, "%s", " IS NULL ");
+				else if (ntest->nulltesttype == IS_NOT_NULL)
+					appendStringInfo(buf, "%s", " IS NOT NULL ");
+				else	/* should never happen */
+					elog(ERROR, "invalid NullTest type %d",
+						 ntest->nulltesttype);
+			}
+			break;
+
 		case T_RelabelType:
 			{
 				RelabelType *relabel = (RelabelType *) node;
diff -Naur pgsql.virg/src/include/nodes/nodes.h pgsql.dev/src/include/nodes/nodes.h
--- pgsql.virg/src/include/nodes/nodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/nodes.h	Mon Jun 18 04:35:25 2001
@@ -225,6 +225,7 @@
 	T_RowMarkXXX,				/* not used anymore; tag# available */
 	T_FkConstraint,
 	T_PrivGrantee,
+	T_NullTest,
 
 	/*
 	 * TAGS FOR FUNCTION-CALL CONTEXT AND RESULTINFO NODES (see fmgr.h)
diff -Naur pgsql.virg/src/include/nodes/parsenodes.h pgsql.dev/src/include/nodes/parsenodes.h
--- pgsql.virg/src/include/nodes/parsenodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/parsenodes.h	Mon Jun 18 04:39:00 2001
@@ -1054,6 +1054,28 @@
 	Node	   *result;			/* substitution result */
 } CaseWhen;
 
+/* ----------------
+ * NullTest
+ *
+ * NullTest represents the operation of comparing a value to a NULL
+ * value.  At runtime, the input expression is expected to be IS NULL,
+ * IS NOT NULL, ISNULL, or NOTNULL.
+ * The appropriate test is performed and returned as a Datum.
+ * ----------------
+ */
+
+typedef enum NullTestType
+{
+	IS_NULL, IS_NOT_NULL
+} NullTestType;
+
+typedef struct NullTest
+{
+	NodeTag			type;
+	Node			*arg;			/* input expression */
+	NullTestType	nulltesttype;	/* IS NULL, IS NOT NULL */
+} NullTest;
+
 /*
  * ColumnDef - column definition (used in various creates)
  *

#18

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Joe Conway (#17)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

Attached is a patch for a new NullTest node type for review and comment.

I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...

Based on this, should support for the converting "a = null" to "a is null"
be dropped?

My opinion on that is already on record ;-)

I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?

Curious; I'd have said that that is clearly within the spec. Can anyone
check it on some other engines?

regards, tom lane

#19

joseph.conway@home.com

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...

OK -- I'll get started on BooleanTest in the next day or two.

-- Joe

#20

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Joe Conway (#17)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?

I dug into this further and discovered that indeed it is not SQL92
... but it is SQL99. Amazingly enough, SQL92 doesn't allow boolean
expressions as a possible type of general expression:

It only allows them as <search condition>s, which is to say WHERE,
HAVING, CASE WHEN, CHECK, and one or two other places.

But SQL99 gets it right:

Looks like we're ahead of the curve here...

regards, tom lane

#21

joseph.conway@home.com

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

1 attachment(s)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...

OK -- here's the full patch for review. It includes two new expression node
types: NullTest and BooleanTest. I have a couple of questions/comments
remaining WRT this:

-- Should I increment catversion.h as part of the patch (I didn't in this
patch), or is that usually centrally controlled by Bruce (or whomever
commits the change)?

-- IMHO, if we are going to keep the (a = null) to (a is null) conversion,
then there should also be a similar conversion from (a != null) to (a is not
null). Otherwise the two operations which may be expected to be
complimentary (as evidenced by at least one recent post) are not.

-- If I have interpreted SQL92 correctly UNKNOWN IS TRUE should return
FALSE, and UNKNOWN IS NOT TRUE is equivalent to NOT (UNKNOWN IS TRUE) ==>
TRUE. Is this correct?

Thanks,

-- Joe

Attachments:

newnodes01.diffapplication/octet-stream; name=newnodes01.diffDownload

diff -Naur pgsql.virg/src/backend/executor/execQual.c pgsql.dev/src/backend/executor/execQual.c
--- pgsql.virg/src/backend/executor/execQual.c	Sat Jun 16 18:56:42 2001
+++ pgsql.dev/src/backend/executor/execQual.c	Tue Jun 19 05:26:22 2001
@@ -62,7 +62,10 @@
 static Datum ExecEvalOr(Expr *orExpr, ExprContext *econtext, bool *isNull);
 static Datum ExecEvalCase(CaseExpr *caseExpr, ExprContext *econtext,
 			 bool *isNull, ExprDoneCond *isDone);
-
+static Datum ExecEvalNullTest(NullTest *ntest, ExprContext *econtext,
+			 bool *isNull, ExprDoneCond *isDone);
+static Datum ExecEvalBooleanTest(BooleanTest *btest, ExprContext *econtext,
+			 bool *isNull, ExprDoneCond *isDone);
 
 /*----------
  *	  ExecEvalArrayRef
@@ -1092,6 +1095,110 @@
 }
 
 /* ----------------------------------------------------------------
+ *		ExecEvalNullTest
+ *
+ *		Evaluate a NullTest node.
+ * ----------------------------------------------------------------
+ */
+static Datum
+ExecEvalNullTest(NullTest *ntest,
+					ExprContext *econtext,
+					bool *isNull,
+					ExprDoneCond *isDone)
+{
+	Datum		result;
+
+	result = ExecEvalExpr(ntest->arg, econtext, isNull, isDone);
+    switch (ntest->nulltesttype)
+    {
+        case IS_NULL:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(true);
+            }
+            else
+                return BoolGetDatum(false);
+        case IS_NOT_NULL:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(false);
+            }
+            else
+                return BoolGetDatum(true);
+        default:
+            elog(ERROR, "ExecEvalNullTest: unexpected nulltesttype %d",
+                 (int) ntest->nulltesttype);
+            return (Datum) 0;  /* keep compiler quiet */
+    }
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEvalBooleanTest
+ *
+ *		Evaluate a BooleanTest node.
+ * ----------------------------------------------------------------
+ */
+static Datum
+ExecEvalBooleanTest(BooleanTest *btest,
+					ExprContext *econtext,
+					bool *isNull,
+					ExprDoneCond *isDone)
+{
+	Datum		result;
+
+	result = ExecEvalExpr(btest->arg, econtext, isNull, isDone);
+    switch (btest->booltesttype)
+    {
+        case IS_TRUE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(false);
+            }
+            else if (result == true)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        case IS_NOT_FALSE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(true);
+            }
+            else if (result == true)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        case IS_FALSE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(false);
+            }
+            else if (result == false)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        case IS_NOT_TRUE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(true);
+            }
+            else if (result == false)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        default:
+            elog(ERROR, "ExecEvalBooleanTest: unexpected booltesttype %d",
+                 (int) btest->booltesttype);
+            return (Datum) 0;  /* keep compiler quiet */
+    }
+}
+
+/* ----------------------------------------------------------------
  *		ExecEvalFieldSelect
  *
  *		Evaluate a FieldSelect node.
@@ -1262,6 +1369,18 @@
 			break;
 		case T_CaseExpr:
 			retDatum = ExecEvalCase((CaseExpr *) expression,
+									econtext,
+									isNull,
+									isDone);
+			break;
+		case T_NullTest:
+			retDatum = ExecEvalNullTest((NullTest *) expression,
+									econtext,
+									isNull,
+									isDone);
+			break;
+		case T_BooleanTest:
+			retDatum = ExecEvalBooleanTest((BooleanTest *) expression,
 									econtext,
 									isNull,
 									isDone);
diff -Naur pgsql.virg/src/backend/nodes/copyfuncs.c pgsql.dev/src/backend/nodes/copyfuncs.c
--- pgsql.virg/src/backend/nodes/copyfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/copyfuncs.c	Tue Jun 19 05:05:22 2001
@@ -1017,6 +1017,42 @@
 	return newnode;
 }
 
+/* ----------------
+ *		_copyNullTest
+ * ----------------
+ */
+static NullTest *
+_copyNullTest(NullTest *from)
+{
+	NullTest *newnode = makeNode(NullTest);
+
+	/*
+	 * copy remainder of node
+	 */
+	Node_Copy(from, newnode, arg);
+	newnode->nulltesttype = from->nulltesttype;
+
+	return newnode;
+}
+
+/* ----------------
+ *		_copyBooleanTest
+ * ----------------
+ */
+static BooleanTest *
+_copyBooleanTest(BooleanTest *from)
+{
+	BooleanTest *newnode = makeNode(BooleanTest);
+
+	/*
+	 * copy remainder of node
+	 */
+	Node_Copy(from, newnode, arg);
+	newnode->booltesttype = from->booltesttype;
+
+	return newnode;
+}
+
 static ArrayRef *
 _copyArrayRef(ArrayRef *from)
 {
@@ -2953,6 +2989,12 @@
 			break;
 		case T_CaseWhen:
 			retval = _copyCaseWhen(from);
+			break;
+		case T_NullTest:
+			retval = _copyNullTest(from);
+			break;
+		case T_BooleanTest:
+			retval = _copyBooleanTest(from);
 			break;
 		case T_FkConstraint:
 			retval = _copyFkConstraint(from);
diff -Naur pgsql.virg/src/backend/nodes/equalfuncs.c pgsql.dev/src/backend/nodes/equalfuncs.c
--- pgsql.virg/src/backend/nodes/equalfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/equalfuncs.c	Tue Jun 19 05:12:10 2001
@@ -1712,6 +1712,26 @@
 	return true;
 }
 
+static bool
+_equalNullTest(NullTest *a, NullTest *b)
+{
+	if (!equal(a->arg, b->arg))
+		return false;
+	if (a->nulltesttype != b->nulltesttype)
+		return false;
+	return true;
+}
+
+static bool
+_equalBooleanTest(BooleanTest *a, BooleanTest *b)
+{
+	if (!equal(a->arg, b->arg))
+		return false;
+	if (a->booltesttype != b->booltesttype)
+		return false;
+	return true;
+}
+
 /*
  * Stuff from pg_list.h
  */
@@ -2119,6 +2139,12 @@
 			break;
 		case T_CaseWhen:
 			retval = _equalCaseWhen(a, b);
+			break;
+		case T_NullTest:
+			retval = _equalNullTest(a, b);
+			break;
+		case T_BooleanTest:
+			retval = _equalBooleanTest(a, b);
 			break;
 		case T_FkConstraint:
 			retval = _equalFkConstraint(a, b);
diff -Naur pgsql.virg/src/backend/nodes/outfuncs.c pgsql.dev/src/backend/nodes/outfuncs.c
--- pgsql.virg/src/backend/nodes/outfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/outfuncs.c	Tue Jun 19 05:11:03 2001
@@ -1403,6 +1403,32 @@
 }
 
 /*
+ *	NullTest
+ */
+static void
+_outNullTest(StringInfo str, NullTest *node)
+{
+	appendStringInfo(str, " NULLTEST :arg ");
+	_outNode(str, node->arg);
+
+	appendStringInfo(str, " :nulltesttype %d ",
+				   node->nulltesttype);
+}
+
+/*
+ *	BooleanTest
+ */
+static void
+_outBooleanTest(StringInfo str, BooleanTest *node)
+{
+	appendStringInfo(str, " BOOLEANTEST :arg ");
+	_outNode(str, node->arg);
+
+	appendStringInfo(str, " :booltesttype %d ",
+				   node->booltesttype);
+}
+
+/*
  * _outNode -
  *	  converts a Node into ascii string and append it to 'str'
  */
@@ -1639,7 +1665,12 @@
 			case T_CaseWhen:
 				_outCaseWhen(str, obj);
 				break;
-
+			case T_NullTest:
+				_outNullTest(str, obj);
+				break;
+			case T_BooleanTest:
+				_outBooleanTest(str, obj);
+				break;
 			case T_VariableSetStmt:
 				break;
 			case T_SelectStmt:
diff -Naur pgsql.virg/src/backend/nodes/readfuncs.c pgsql.dev/src/backend/nodes/readfuncs.c
--- pgsql.virg/src/backend/nodes/readfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/readfuncs.c	Tue Jun 19 05:27:26 2001
@@ -860,6 +860,56 @@
 }
 
 /* ----------------
+ *		_readNullTest
+ *
+ *	NullTest is a subclass of Node
+ * ----------------
+ */
+static NullTest *
+_readNullTest(void)
+{
+	NullTest	*local_node;
+	char		*token;
+	int			length;
+
+	local_node = makeNode(NullTest);
+
+	token = pg_strtok(&length); /* eat :arg */
+	local_node->arg = nodeRead(true);	/* now read it */
+
+	token = pg_strtok(&length); /* eat :nulltesttype */
+	token = pg_strtok(&length); /* get nulltesttype */
+	local_node->nulltesttype = (NullTestType) atoi(token);
+
+	return local_node;
+}
+
+/* ----------------
+ *		_readBooleanTest
+ *
+ *	BooleanTest is a subclass of Node
+ * ----------------
+ */
+static BooleanTest *
+_readBooleanTest(void)
+{
+	BooleanTest	*local_node;
+	char		*token;
+	int			length;
+
+	local_node = makeNode(BooleanTest);
+
+	token = pg_strtok(&length); /* eat :arg */
+	local_node->arg = nodeRead(true);	/* now read it */
+
+	token = pg_strtok(&length); /* eat :booltesttype */
+	token = pg_strtok(&length); /* get booltesttype */
+	local_node->booltesttype = (BoolTestType) atoi(token);
+
+	return local_node;
+}
+
+/* ----------------
  *		_readVar
  *
  *	Var is a subclass of Expr
@@ -1966,6 +2016,10 @@
 		return_value = _readCaseExpr();
 	else if (length == 4 && strncmp(token, "WHEN", length) == 0)
 		return_value = _readCaseWhen();
+	else if (length == 8 && strncmp(token, "NULLTEST", length) == 0)
+		return_value = _readNullTest();
+	else if (length == 11 && strncmp(token, "BOOLEANTEST", length) == 0)
+		return_value = _readBooleanTest();
 	else
 		elog(ERROR, "badly formatted planstring \"%.10s\"...", token);
 
diff -Naur pgsql.virg/src/backend/optimizer/util/clauses.c pgsql.dev/src/backend/optimizer/util/clauses.c
--- pgsql.virg/src/backend/optimizer/util/clauses.c	Sat Jun 16 18:56:46 2001
+++ pgsql.dev/src/backend/optimizer/util/clauses.c	Tue Jun 19 05:02:53 2001
@@ -1580,6 +1580,10 @@
 					return true;
 			}
 			break;
+		case T_NullTest:
+			return walker(((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return walker(((BooleanTest *) node)->arg, context);
 		case T_SubLink:
 			{
 				SubLink    *sublink = (SubLink *) node;
@@ -1930,6 +1934,26 @@
 				FLATCOPY(newnode, casewhen, CaseWhen);
 				MUTATE(newnode->expr, casewhen->expr, Node *);
 				MUTATE(newnode->result, casewhen->result, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest *ntest = (NullTest *) node;
+				NullTest *newnode;
+
+				FLATCOPY(newnode, ntest, NullTest);
+				MUTATE(newnode->arg, ntest->arg, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *btest = (BooleanTest *) node;
+				BooleanTest *newnode;
+
+				FLATCOPY(newnode, btest, BooleanTest);
+				MUTATE(newnode->arg, btest->arg, Node *);
 				return (Node *) newnode;
 			}
 			break;
diff -Naur pgsql.virg/src/backend/parser/gram.y pgsql.dev/src/backend/parser/gram.y
--- pgsql.virg/src/backend/parser/gram.y	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/gram.y	Tue Jun 19 04:46:06 2001
@@ -4434,9 +4434,19 @@
 					 * (like Microsoft's).  Turn these into IS NULL exprs.
 					 */
 					if (exprIsNullConstant($3))
-						$$ = makeA_Expr(ISNULL, NULL, $1, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $1;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else if (exprIsNullConstant($1))
-						$$ = makeA_Expr(ISNULL, NULL, $3, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $3;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else
 						$$ = makeA_Expr(OP, "=", $1, $3);
 				}
@@ -4499,59 +4509,79 @@
 					n->agg_distinct = FALSE;
 					$$ = makeA_Expr(OP, "!~~*", $1, (Node *) n);
 				}
-
+		/* NullTest clause
+		 * Define SQL92-style Null test clause.
+		 * Allow two forms described in the standard:
+		 *  a IS NULL
+		 *  a IS NOT NULL
+		 * Allow two SQL extensions
+		 *  a ISNULL
+		 *  a NOTNULL
+		 */
 		| a_expr ISNULL
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NULL_P
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr NOTNULL
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NOT NULL_P
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		/* IS TRUE, IS FALSE, etc used to be function calls
 		 *  but let's make them expressions to allow the optimizer
 		 *  a chance to eliminate them if a_expr is a constant string.
 		 * - thomas 1997-12-22
+		 *
+		 *  Created BooleanTest Node type, and changed handling
+		 *  for NULL inputs
+		 * - jec 2001-06-18
 		 */
 		| a_expr IS TRUE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "t";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_TRUE;
+					$$ = (Node *)b;
 				}
 		| a_expr IS NOT FALSE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "t";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_NOT_FALSE;
+					$$ = (Node *)b;
 				}
 		| a_expr IS FALSE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "f";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_FALSE;
+					$$ = (Node *)b;
 				}
 		| a_expr IS NOT TRUE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "f";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_NOT_TRUE;
+					$$ = (Node *)b;
 				}
 		| a_expr BETWEEN b_expr AND b_expr			%prec BETWEEN
 				{
diff -Naur pgsql.virg/src/backend/parser/parse_expr.c pgsql.dev/src/backend/parser/parse_expr.c
--- pgsql.virg/src/backend/parser/parse_expr.c	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/parse_expr.c	Tue Jun 19 05:01:12 2001
@@ -510,6 +510,26 @@
 				break;
 			}
 
+		case T_NullTest:
+			{
+				NullTest   *n = (NullTest *) expr;
+
+				n->arg = transformExpr(pstate, n->arg, precedence);
+
+				result = (Node *) expr;
+				break;
+			}
+
+		case T_BooleanTest:
+			{
+				BooleanTest   *b = (BooleanTest *) expr;
+
+				b->arg = transformExpr(pstate, b->arg, precedence);
+
+				result = (Node *) expr;
+				break;
+			}
+
 			/*
 			 * Quietly accept node types that may be presented when we are
 			 * called on an already-transformed tree.
@@ -668,6 +688,12 @@
 			break;
 		case T_CaseWhen:
 			type = exprType(((CaseWhen *) expr)->result);
+			break;
+		case T_NullTest:
+			type = BOOLOID;
+			break;
+		case T_BooleanTest:
+			type = BOOLOID;
 			break;
 		case T_Ident:
 			/* is this right? */
diff -Naur pgsql.virg/src/backend/utils/adt/ruleutils.c pgsql.dev/src/backend/utils/adt/ruleutils.c
--- pgsql.virg/src/backend/utils/adt/ruleutils.c	Sat Jun 16 18:56:48 2001
+++ pgsql.dev/src/backend/utils/adt/ruleutils.c	Tue Jun 19 04:59:46 2001
@@ -1948,6 +1948,52 @@
 			}
 			break;
 
+		case T_NullTest:
+			{
+				NullTest		*ntest = (NullTest *) node;
+
+				get_rule_expr(ntest->arg, context);
+			    switch (ntest->nulltesttype)
+			    {
+			        case IS_NULL:
+						appendStringInfo(buf, "%s", " IS NULL ");
+						break;
+			        case IS_NOT_NULL:
+						appendStringInfo(buf, "%s", " IS NOT NULL ");
+						break;
+			        default:
+			            elog(ERROR, "get_rule_expr: unexpected nulltesttype %d",
+			                 (int) ntest->nulltesttype);
+				}
+			}
+			break;
+
+		case T_BooleanTest:
+			{
+				BooleanTest		*btest = (BooleanTest *) node;
+
+				get_rule_expr(btest->arg, context);
+			    switch (btest->booltesttype)
+			    {
+			        case IS_TRUE:
+						appendStringInfo(buf, "%s", " IS TRUE ");
+						break;
+			        case IS_NOT_FALSE:
+						appendStringInfo(buf, "%s", " IS NOT FALSE ");
+						break;
+			        case IS_FALSE:
+						appendStringInfo(buf, "%s", " IS FALSE ");
+						break;
+			        case IS_NOT_TRUE:
+						appendStringInfo(buf, "%s", " IS NOT TRUE ");
+						break;
+			        default:
+			            elog(ERROR, "get_rule_expr: unexpected booltesttype %d",
+			                 (int) btest->booltesttype);
+				}
+			}
+			break;
+
 		case T_SubLink:
 			get_sublink_expr(node, context);
 			break;
diff -Naur pgsql.virg/src/include/nodes/nodes.h pgsql.dev/src/include/nodes/nodes.h
--- pgsql.virg/src/include/nodes/nodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/nodes.h	Tue Jun 19 04:46:52 2001
@@ -225,6 +225,8 @@
 	T_RowMarkXXX,				/* not used anymore; tag# available */
 	T_FkConstraint,
 	T_PrivGrantee,
+	T_NullTest,
+	T_BooleanTest,
 
 	/*
 	 * TAGS FOR FUNCTION-CALL CONTEXT AND RESULTINFO NODES (see fmgr.h)
diff -Naur pgsql.virg/src/include/nodes/parsenodes.h pgsql.dev/src/include/nodes/parsenodes.h
--- pgsql.virg/src/include/nodes/parsenodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/parsenodes.h	Tue Jun 19 05:25:18 2001
@@ -1054,6 +1054,50 @@
 	Node	   *result;			/* substitution result */
 } CaseWhen;
 
+/* ----------------
+ * NullTest
+ *
+ * NullTest represents the operation of comparing a value to a NULL
+ * value.  At runtime, the input expression is expected to be IS NULL,
+ * IS NOT NULL, ISNULL, or NOTNULL.
+ * The appropriate test is performed and returned as a Datum.
+ * ----------------
+ */
+
+typedef enum NullTestType
+{
+	IS_NULL, IS_NOT_NULL
+} NullTestType;
+
+typedef struct NullTest
+{
+	NodeTag			type;
+	Node			*arg;			/* input expression */
+	NullTestType	nulltesttype;	/* IS NULL, IS NOT NULL */
+} NullTest;
+
+/* ----------------
+ * BooleanTest
+ *
+ * BooleanTest represents the operation of comparing a value to a TRUE
+ * or FALSE value.  At runtime, the input expression is expected to be
+ * IS TRUE, IS NOT FALSE, IS FALSE, or IS NOT TRUE.
+ * The appropriate test is performed and returned as a Datum.
+ * ----------------
+ */
+
+typedef enum BoolTestType
+{
+	IS_TRUE, IS_NOT_FALSE, IS_FALSE, IS_NOT_TRUE
+} BoolTestType;
+
+typedef struct BooleanTest
+{
+	NodeTag			type;
+	Node			*arg;			/* input expression */
+	BoolTestType	booltesttype;	/* IS_TRUE, IS_NOT_FALSE, IS_FALSE, IS_NOT_TRUE */
+} BooleanTest;
+
 /*
  * ColumnDef - column definition (used in various creates)
  *

#22

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Joe Conway (#21)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

-- Should I increment catversion.h as part of the patch (I didn't in this
patch), or is that usually centrally controlled by Bruce (or whomever
commits the change)?

It's good to put the catversion bump into the patch, else the committer
might forget to do it.

-- IMHO, if we are going to keep the (a = null) to (a is null) conversion,
then there should also be a similar conversion from (a != null) to (a is not
null). Otherwise the two operations which may be expected to be
complimentary (as evidenced by at least one recent post) are not.

I'd resist this. The only reason the =NULL hack is in there at all is
to support Access97. We shouldn't extend the deviation from standards
further than the minimum needed to do that. The hack is fundamentally
inconsistent anyway, and breaking our standards compliance further in
pursuit of bogus consistency seems misguided.

Personally I'd rather take out the =NULL conversion anyway...

-- If I have interpreted SQL92 correctly UNKNOWN IS TRUE should return
FALSE, and UNKNOWN IS NOT TRUE is equivalent to NOT (UNKNOWN IS TRUE) ==>
TRUE. Is this correct?

Yes. Table 15 is pretty illegible in the ASCII draft copies of SQL92
and SQL99, but the PDF version of SQL99 is okay, and it makes clear
what you'd expect:

input IS TRUE IS FALSE IS UNKNOWN

true true false false
false false true false
unknown false false true

and then the NOT variants are defined as

x IS NOT foo == NOT (x IS foo)

I'll try to look over and commit the patch later today.

For extra credit ;-) ... if you'd like to learn a little bit about the
optimizer, think about teaching clause_selectivity() in
optimizer/path/clausesel.c how to estimate the selectivity of these new
expression nodes. In the case where the argument is a boolean column
that we have statistics for, it should be possible to derive the correct
answer (including accounting for NULLs). If the argument is more
complex than that, you probably can't do anything really intelligent,
but you could handwave away NULLs and then compute the appropriate
function of the clause_selectivity() of the argument.

regards, tom lane

#23

joseph.conway@home.com

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

-- Should I increment catversion.h as part of the patch (I didn't in

this

patch), or is that usually centrally controlled by Bruce (or whomever
commits the change)?

It's good to put the catversion bump into the patch, else the committer
might forget to do it.

OK -- will do next time.

further than the minimum needed to do that. The hack is fundamentally
inconsistent anyway, and breaking our standards compliance further in
pursuit of bogus consistency seems misguided.

Personally I'd rather take out the =NULL conversion anyway...

I'd second that.

Yes. Table 15 is pretty illegible in the ASCII draft copies of SQL92
and SQL99, but the PDF version of SQL99 is okay, and it makes clear
what you'd expect:

OT -- I need to buy a copy of SQL99, but it seems to be split into several
parts (that didn't exist for SQL92). Which one (or more) are the most useful
for PostgreSQL hacking?

For extra credit ;-) ... if you'd like to learn a little bit about the
optimizer, think about teaching clause_selectivity() in
optimizer/path/clausesel.c how to estimate the selectivity of these new
expression nodes. In the case where the argument is a boolean column
that we have statistics for, it should be possible to derive the correct
answer (including accounting for NULLs). If the argument is more
complex than that, you probably can't do anything really intelligent,
but you could handwave away NULLs and then compute the appropriate
function of the clause_selectivity() of the argument.

Sure, I love a challenge ;) -- I'll take a look.

One issue I noticed this morning with this patch is that 't' and 'f' are no
longer being implicitly cast into boolean, i.e. test=# select 't' is true;
?column?
----------
f
(1 row)

test=# select 't'::bool is true;
?column?
----------
t
(1 row)

Any thoughts on where look to fix this?

Thanks,

-- Joe

#24

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Joe Conway (#23)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

OT -- I need to buy a copy of SQL99, but it seems to be split into several
parts (that didn't exist for SQL92). Which one (or more) are the most useful
for PostgreSQL hacking?

I find that Part 2 is almost the only one I ever look at. I'm not even
sure what's in the other parts ...

One issue I noticed this morning with this patch is that 't' and 'f' are no
longer being implicitly cast into boolean, i.e. test=# select 't' is true;
?column?
----------
f
(1 row)

Now that you mention it, it looks like all of our constructs that expect
boolean fail to coerce unknown-type literals into bools. The rest of
them raise errors, eg:

regression=# select not 't';
ERROR: argument to NOT is type 'unknown', not 'bool'

but this is pretty bogus. ISTM all these places should try to coerce
their inputs to bool before they complain. This involves calling
can_coerce_type and then coerce_type; it's tedious enough that it'd be
worth setting up a subroutine to do it. I'll add something for that,
and fix the other places too.

regards, tom lane

#25

joseph.conway@home.com

over 24 years ago

In reply to: Zeugswetter Andreas SB (#1)

1 attachment(s)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

For extra credit ;-) ... if you'd like to learn a little bit about the
optimizer, think about teaching clause_selectivity() in
optimizer/path/clausesel.c how to estimate the selectivity of these new
expression nodes. In the case where the argument is a boolean column
that we have statistics for, it should be possible to derive the correct
answer (including accounting for NULLs). If the argument is more
complex than that, you probably can't do anything really intelligent,
but you could handwave away NULLs and then compute the appropriate
function of the clause_selectivity() of the argument.

Attached is my initial attempt to teach clause_selectivity about BooleanTest
nodes, for review and comment. Please let me know if I'm headed in the right
direction.

Thanks,

Joe

Attachments:

booltestsel_00.diffapplication/octet-stream; name=booltestsel_00.diffDownload

diff -Naur pgsql.virg/src/backend/optimizer/path/clausesel.c pgsql.dev/src/backend/optimizer/path/clausesel.c
--- pgsql.virg/src/backend/optimizer/path/clausesel.c	Thu Jun 21 04:18:11 2001
+++ pgsql.dev/src/backend/optimizer/path/clausesel.c	Sat Jun 23 18:32:17 2001
@@ -22,6 +22,7 @@
 #include "optimizer/plancat.h"
 #include "optimizer/restrictinfo.h"
 #include "parser/parsetree.h"
+#include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 
@@ -408,6 +409,14 @@
 											 varRelid);
 			}
 		}
+	}
+	else if (IsA(clause, BooleanTest))
+	{
+		s1 = (Selectivity) DatumGetFloat8(DirectFunctionCall4(booltestsel,
+											 PointerGetDatum(root),
+											 (Datum)0,
+											 PointerGetDatum(clause),
+											 Int32GetDatum(varRelid)));
 	}
 	else if (IsA(clause, Param))
 	{
diff -Naur pgsql.virg/src/backend/utils/adt/selfuncs.c pgsql.dev/src/backend/utils/adt/selfuncs.c
--- pgsql.virg/src/backend/utils/adt/selfuncs.c	Thu Jun 21 04:18:12 2001
+++ pgsql.dev/src/backend/utils/adt/selfuncs.c	Sat Jun 23 19:01:23 2001
@@ -934,6 +934,181 @@
 }
 
 /*
+ *		booltestsel		- Selectivity of BooleanTest Node.
+ */
+Datum
+booltestsel(PG_FUNCTION_ARGS)
+{
+	Query		   *root = (Query *) PG_GETARG_POINTER(0);
+	Oid				operator = PG_GETARG_OID(1);
+	List		   *args = (List *) PG_GETARG_POINTER(2);
+	int				varRelid = PG_GETARG_INT32(3);
+	Var			   *var;
+	Node		   *clause;
+	BooleanTest	   *btest;
+	Node		   *arg;
+	Oid				relid;
+	HeapTuple		statsTuple;
+	Datum		   *values;
+	int				nvalues;
+	float4		   *numbers;
+	int				nnumbers;
+	double			selec = DEFAULT_EQ_SEL;
+
+	operator = operator;	/* keep compiler quiet */
+
+	clause = (Node *) args;
+	/*
+	 * If expression is not BooleanTest
+	 * then punt and return a default estimate.
+	 */
+	if (!IsA(clause, BooleanTest))
+		PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);
+
+	btest = (BooleanTest *) clause;
+	arg = (Node *) btest->arg;
+
+	/*
+	 * Ignore any binary-compatible relabeling
+	 */
+	if (IsA(arg, RelabelType))
+		arg = ((RelabelType *) arg)->arg;
+
+	if (IsA(arg, Var) && (varRelid == 0 || varRelid == ((Var *) arg)->varno))
+		var = (Var *) arg;
+	else	/* punt */
+		PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);
+
+	relid = getrelid(var->varno, root->rtable);
+	if (relid == InvalidOid)
+		PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);
+
+	/* get stats for the attribute, if available */
+	statsTuple = SearchSysCache(STATRELATT,
+								ObjectIdGetDatum(relid),
+								Int16GetDatum(var->varattno),
+								0, 0);
+	if (HeapTupleIsValid(statsTuple))
+	{
+		Form_pg_statistic stats;
+
+		stats = (Form_pg_statistic) GETSTRUCT(statsTuple);
+
+		/*
+		 * Var is being compared to a known non-null constant,
+		 * i.e. true, false, or unknown
+		 */
+		if (get_attstatsslot(statsTuple, var->vartype, var->vartypmod,
+							 STATISTIC_KIND_MCV, InvalidOid,
+							 &values, &nvalues,
+							 &numbers, &nnumbers))
+		{
+			bool	match = false;
+			int		i;
+
+			if (btest->booltesttype == IS_UNKNOWN || btest->booltesttype == IS_NOT_UNKNOWN)
+			{
+				switch (btest->booltesttype)
+			    {
+			        case IS_UNKNOWN:
+						/*
+						 * Only select unknown (null) values.
+						 * Use null fraction directly.
+						 */
+						selec = stats->stanullfrac;
+						break;
+			        case IS_NOT_UNKNOWN:
+						/*
+						 * Only select not unknown (not null) values.
+						 * Calculate from null fraction directly.
+						 */
+						selec = 1.0 - stats->stanullfrac;
+						break;
+			        default:
+			            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+			                 (int) btest->booltesttype);
+			            return (Datum) 0;  /* keep compiler quiet */
+				}
+			}
+			else
+			{
+				for (i = 0; i < nvalues; i++)
+				{
+					switch (btest->booltesttype)
+				    {
+				        case IS_TRUE:
+							match = (values[i] == true);
+							break;
+				        case IS_NOT_TRUE:
+							match = (values[i] == false);
+							break;
+				        case IS_FALSE:
+							match = (values[i] == false);
+							break;
+				        case IS_NOT_FALSE:
+							match = (values[i] == true);
+							break;
+				        default:
+				            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+				                 (int) btest->booltesttype);
+				            return (Datum) 0;  /* keep compiler quiet */
+					}
+					if (match == true)
+						break;
+				}
+				if (match == true)
+				{
+					/*
+					 * Constant is "=" to this common value.  We know
+					 * selectivity exactly (or as exactly as VACUUM
+					 * could calculate it, anyway).
+					 */
+					selec = numbers[i];
+				}
+				else
+				{
+					/*
+					 * Constant is not null and not "=" to this common
+					 * value. Since only true, false, and null are
+					 * possible, calculate it.
+					 */
+					selec = 1.0 - numbers[i] - stats->stanullfrac;
+				}
+			}
+		}
+		else
+		{
+			/*
+			 * no most-common-value info available
+			 */
+			values = NULL;
+			numbers = NULL;
+			nvalues = nnumbers = 0;
+		}
+
+		ReleaseSysCache(statsTuple);
+	}
+	else
+	{
+		/*
+		 * No VACUUM ANALYZE stats available, so make a guess using
+		 * the number of distinct values (2) and assuming they are
+		 * equally common.  (The guess is unlikely to be very good,
+		 * but we do know a few special cases.)
+		 */
+		selec = 0.5;
+	}
+
+	/* result should be in range, but make sure... */
+	if (selec < 0.0)
+		selec = 0.0;
+	else if (selec > 1.0)
+		selec = 1.0;
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
+
+/*
  *		eqjoinsel		- Join selectivity of "="
  */
 Datum
diff -Naur pgsql.virg/src/include/utils/builtins.h pgsql.dev/src/include/utils/builtins.h
--- pgsql.virg/src/include/utils/builtins.h	Thu Jun 21 04:18:21 2001
+++ pgsql.dev/src/include/utils/builtins.h	Sat Jun 23 15:35:02 2001
@@ -355,6 +355,7 @@
 extern Datum icregexnesel(PG_FUNCTION_ARGS);
 extern Datum nlikesel(PG_FUNCTION_ARGS);
 extern Datum icnlikesel(PG_FUNCTION_ARGS);
+extern Datum booltestsel(PG_FUNCTION_ARGS);
 
 extern Datum eqjoinsel(PG_FUNCTION_ARGS);
 extern Datum neqjoinsel(PG_FUNCTION_ARGS);

#26

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Joe Conway (#25)

Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

Attached is my initial attempt to teach clause_selectivity about BooleanTest
nodes, for review and comment. Please let me know if I'm headed in the right
direction.

Looks like you're headed the right way. A few comments ---

1. Don't forget to add a routine for NullTest nodes, too. These can be
simpler, since you know that all you care about is the stanullfrac; no
need to look at common values.

2. I don't see any real good reason to force booltestsel() (likewise
nulltestsel(), when you add it) to have the same call signature as the
various operator selectivity routines. Those have to be looked up in a
table and called via fmgr, but there's no reason to think that
booltestsel will ever be called in the same way. We could certainly
lose the useless "operator" argument. I'd be inclined to declare
booltestsel as

Selectivity booltestsel(Query *root,
BooleanTest *clause,
int varRelid);

and skip *all* the fmgr notational cruft. The only trouble with this is
we'd not want to put such a declaration into builtins.h, because it'd
force including a bunch more .h files into builtins.h, which is ugly
and might even cause some circularity problems. There are already some
selectivity-related things in builtins.h that don't really belong there.
I am starting to think that selfuncs.c deserves its own header file
separate from builtins.h, so that it doesn't clutter builtins.h with a
bunch of selectivity-specific stuff. (Comments anyone?)

3. You should be using tests like "if (DatumGetBool(values[i]))" or
"if (!DatumGetBool(values[i]))" to examine the MCV-list values, not
"if (values[i] == true)". The latter is a type cheat: it assumes that
Datum can be implicitly converted to bool, which is only sort-of true at
the moment and might be not-at-all-true in the future. You could write
"if (DatumGetBool(values[i]) == true)" and satisfy the strict-typing
part of this concern, but there's also a more stylistic thing here.
Personally I find "if (boolean == true)" or "if (boolean == false)"
to be poor style, it should be just "if (boolean)" or "if (!boolean)".
This gets back to whether you consider boolean to be a logically
distinct type from int, I suppose. Feel free to ignore that part
if you strongly disagree.

4. I don't think that DEFAULT_EQ_SEL is an appropriate default result
for this routine. Arguably, it should elog(ERROR) if not passed a
BooleanTest node, so the first one or two occurrences are nonissues
anyway. However, if you find a non-Var argument or a Var that you
can't get statistics for, you do need to return some sort of reasonable
default, and that I think is not the best. In any case the default should
take the test type into account. For IS_NULL and IS_UNKNOWN, I'd think
the default assumption should be small but not zero (0.01 maybe? Any
thoughts out there?). IS_NOT_NULL/UNKNOWN should obviously yield one
minus whatever we use for IS_NULL. It seems reasonable to use 0.5
as default for IS_TRUE and the other three BooleanTests.

5. Actually, for IS_TRUE and the other three BooleanTests, what you
should probably do with a non-Var input is ignore the possibility of a
NULL value and just try to estimate the probability of TRUE vs FALSE,
which you can do by recursing back to clause_selectivity() on the
argument (note this would be invalid for NullTest where the argument
isn't necessarily boolean, but it's legit for BooleanTests). Then
you use either that result or one minus that result, depending on
which BooleanTest you're dealing with.

6. The way you are working with the MCV values seems unnecessarily
complicated. I had a hard time seeing whether it was generating the
right answer (actually, it demonstrably isn't, since for example you
produce the same result for IS_TRUE and IS_NOT_FALSE, which ought to
differ by the stanullfrac amount; but it's too complex anyway).
I'd be inclined to do this in one of two ways:
1. Scan the MCV array to find the entry for "true", and work
with its frequency and the stanullfrac frequency to derive
the appropriate answer depending on the test type.
2. Always use the first (most common val's) frequency. This
value is either true or false, so adjust accordingly.
The simplest way would be to derive true's frequency as
either freq[0] or 1 - freq[0] - stanullfrac, and then
proceed as in #1.
#2 is a little more robust, since it would still work if for some reason
the MCV list contains only an entry for true or only an entry for false.
(I believe that could happen, if the column contains only one value and
possibly some nulls.)

You're off to an excellent start though; you seem to be finding your
way through the code very well.

regards, tom lane

#27