AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Started by Zeugswetter Andreas SBover 24 years ago27 messages
#1Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at

Yes, column = NULL should *never* return true according to the spec (it
should always return NULL in fact as stated). The reason for breaking
with the spec is AFAIK to work with broken microsoft clients that seem to
think that =NULL is a meaningful test and generate queries using that.

Microsoft Access is the guilty party, IIRC. I recently tried to stir up
some interest in changing this behavior back to the standard, but
apparently there are still too many people using broken versions of
Access.

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).
If not it would simply be an extension to the standard with the defined
behavior of beeing identical to "column is null".

Andreas

#2Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).

I don't have the standard handy, but I do have Joe Celko's book, "Data
& Databases: Concepts in Practice". He says (in section 8.2, under
the heading "Multivalued Logic"):

A NULL cannot be compared to another NULL or to a value
with what Dr. Codd called a theta operator and what
programmers call a comparison operator (equal, not equal,
less than, greater than, and so forth). This results in
a three-valued logic, which has an UNKNOWN in addition
to TRUE and FALSE. [...] UNKNOWN is a logical value and
not the same as a NULL, which is a data value. That is
why you have to say X IS [NOT] NULL in SQL and not use
X = NULL instead. Theta operators are expressions of the
form X <comp op> Y; when X or Y or both are NULL, theta
operators will return an UNKNOWN and not a NULL.

He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz". Further, he says:

SQL-92 added a new predicate of the form

<search condition> IS [NOT] TRUE | FALSE | UNKNOWN

which will let you map any combination of three-valued
logic to the two Boolean values.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation. It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison. Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows! This means that the theta
operators are not all treated the same way, which is surely wrong!

It seems to me that the idea of NULL as an unkown data value and
UNKNOWN as the corresponding truth value, combined with the rules for
propagation of NULL in mathematical operations, of UNKNOWN in truth
operations, and from NULL to UNKNOWN by theta operators, is a very
clean, intuitive way of handling these issues. It feels right! :-)

-tih
--
The basic difference is this: hackers build things, crackers break them.

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92

I think you're right.

(e.g. Informix interprets the NULL as column name in this context and
errs out).

NULL is a reserved word, so this would be wrong.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Ivar Helbekkmo (#2)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

On 7 Jun 2001, Tom Ivar Helbekkmo wrote:

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).

He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz". Further, he says:

SQL-92 added a new predicate of the form

<search condition> IS [NOT] TRUE | FALSE | UNKNOWN

which will let you map any combination of three-valued
logic to the two Boolean values.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation. It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison. Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows! This means that the theta
operators are not all treated the same way, which is surely wrong!

That's the nature of the hack we're talking about. It's a grammar level
hack to turn a specific sequence of tokens (= NULL) into IS NULL due
to a client's generated queries. If you're comparing something other
than the constant NULL, it should do what is expected (ie, a comparison
between a NULL in a table or even CAST(NULL as INT4) does the "right"
thing).

I think adding IS UNKNOWN would probably be trivial (I think the code is
basically there in IS NULL.)

#5Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: Stephan Szabo (#4)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

That's the nature of the hack we're talking about. It's a grammar
level hack to turn a specific sequence of tokens (= NULL) into IS
NULL due to a client's generated queries.

Aha! Sorry -- I jumped in late in the discussion without checking
back to see how it started...

OK, I've already said that I like the cleanliness and orthogonality of
NULL is a missing data value, UNKNOWN is a missing truth value, both
propagate in expressions, comparisons with NULL generate UNKNOWN, and
you can use the special comparisons IS [NOT] NULL and IS [NOT] UNKNOWN
to get plain, two-valued Boolean truth values out of them.

The Microsoft compatibility hack is ugly, and should be either a)
removed, b) expanded to include the other comparison operators and
documented as a PostgreSQL proprietary extension, or c) made into a
special feature that's turned on at will by a SET command. I would
applaud a), approve of c), and be dismayed by b).

I think adding IS UNKNOWN would probably be trivial (I think the
code is basically there in IS NULL.)

But if it's implemented, shouldn't the code also differentiate between
UNKNOWN and NULL, by not (as now) using the latter to represent the
former? Or do I misunderstand how it's handled now?

-tih
--
The basic difference is this: hackers build things, crackers break them.

#6Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even
defined or allowed in SQL92

I've just checked, by reading the relevant paragraphs and studying the
BNF, and the standard says that any comparison of the form X <comp op>
Y is unknown if X or Y (or both) is NULL, including the case where
NULL is given as an explicit constant. So, SQL92 clearly demands that
"column = NULL" is UNKNOWN, never TRUE or FALSE.

-tih
--
The basic difference is this: hackers build things, crackers break them.

#7Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Ivar Helbekkmo (#5)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

On 7 Jun 2001, Tom Ivar Helbekkmo wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

That's the nature of the hack we're talking about. It's a grammar
level hack to turn a specific sequence of tokens (= NULL) into IS
NULL due to a client's generated queries.

Aha! Sorry -- I jumped in late in the discussion without checking
back to see how it started...

OK, I've already said that I like the cleanliness and orthogonality of
NULL is a missing data value, UNKNOWN is a missing truth value, both
propagate in expressions, comparisons with NULL generate UNKNOWN, and
you can use the special comparisons IS [NOT] NULL and IS [NOT] UNKNOWN
to get plain, two-valued Boolean truth values out of them.

The Microsoft compatibility hack is ugly, and should be either a)
removed, b) expanded to include the other comparison operators and
documented as a PostgreSQL proprietary extension, or c) made into a
special feature that's turned on at will by a SET command. I would
applaud a), approve of c), and be dismayed by b).

c is the most likely thing to happen probably.

I think adding IS UNKNOWN would probably be trivial (I think the
code is basically there in IS NULL.)

But if it's implemented, shouldn't the code also differentiate between
UNKNOWN and NULL, by not (as now) using the latter to represent the
former? Or do I misunderstand how it's handled now?

Within what's there (using null as unknown), the two tests are nearly
identical and would probably be just a grammar change. Creating a
separate unknown would be more difficult, and I'm not sure it's necessary
to make the distinction. NULL is an unknown value, I'm not sure that
you'd need a separate unknown value specifically for booleans.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

Actually I am not sure whether the column = NULL syntax is even defined
or allowed in SQL92 (e.g. Informix interprets the NULL as column name in
this context and errs out).

Strictly speaking, SQL92 would require you to write
foo = CAST (NULL AS type-of-foo)
However, we allow unadorned NULL in other contexts as a shorthand for
the CAST notation, so it's inconsistent of us to say that in this
context it means something different.

The real problem with accepting this Microsoftism is that it's a trap
for unwary programmers. Case 1: someone who's not studied SQL in detail
might experiment with examples involving "foo = NULL" and jump to
reasonable but entirely incorrect conclusions about how comparisons
involving NULL operate. Case 2: someone who *has* studied SQL, and is
also aware that we accept unadorned NULLs, will also draw the wrong
conclusions about what this construct will do. Bottom line: this kluge
surprises everyone except those who already know it exists. I don't
like systems that surprise their users in inconsistent ways.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Ivar Helbekkmo (#2)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Tom Ivar Helbekkmo <tih@kpnQwest.no> quotes:

... This results in
a three-valued logic, which has an UNKNOWN in addition
to TRUE and FALSE. [...] UNKNOWN is a logical value and
not the same as a NULL, which is a data value.

SQL92 is not very clear about whether NULL and UNKNOWN are distinct,
but it is worth noticing that their truth tables for comparison
operators, and/or/not, etc, only mention unknown --- never null ---
as a possible value of a boolean condition. SQL99 clarifies the
intent:

The data type boolean comprises the distinct truth values true and
false. Unless prohibited by a NOT NULL constraint, the boolean
data type also supports the unknown truth value as the null value.
This specification does not make a distinction between the null
value of the boolean data type and the unknown truth value that is
the result of an SQL <predicate>, <search condition>, or <boolean
value expression>; they may be used interchangeably to mean exactly
the same thing.

Which in fact is what Postgres does.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation.

We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?

It's also worth noticing that our implementation of IS NULL isn't really
up to speed: the spec allows the argument to be a row value constructor,
not just a scalar. But we mostly don't have support for row-value-
constructor expressions anyway (it's not an Entry SQL feature).

regards, tom lane

#10Joe Conway
joe@conway-family.com
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?

I'd like to finish up the has_table_privilege function over the next week or
so and then take this on. Can you point me in a direction to start looking?

Thanks,

Joe

#11Thomas Lockhart
lockhart@fourpalms.org
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

The real problem with accepting this Microsoftism is that it's a trap
for unwary programmers. Case 1: someone who's not studied SQL in detail
might experiment with examples involving "foo = NULL" and jump to
reasonable but entirely incorrect conclusions about how comparisons
involving NULL operate. Case 2: someone who *has* studied SQL, and is
also aware that we accept unadorned NULLs, will also draw the wrong
conclusions about what this construct will do. Bottom line: this kluge
surprises everyone except those who already know it exists. I don't
like systems that surprise their users in inconsistent ways.

These were all points that were brought up and discussed when the hack
was implemented. At the time, the trade between implementing a construct
which is *not allowed* in SQL9x vs enabling the M$Access community to
migrate to PostgreSQL was hashed over and a consensus was reached that
the benefits to allowing it outweighed the drawbacks.

I was of the initial opinion that we should not support M$ pathelogical
non-standard constructs (jeez, they should know better, and probably do,
so it is likely a pathetic attempt to lock in users). But since the
construct is not allowed (or useless), why would anyone feel they need
to use it?

Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists." We have had a strong consensus that
SQL9x standard constructs should be the norm for PostgreSQL, so no need
to rehash that. The issue boils down to, as it did when it first came
up, whether we will make it easier for M$Access users to start migrating
to PostgreSQL. If newer versions of Access emit standard constructs,
then it would be even easier to say that we should jettison the
construct. But if not, istm that we are killing a usability feature on
principle, not from need, and it may be premature to do that.

Using a "SET xxx" switch is not a step in the right direction istm since
it requires an extra PostgreSQL-specific command to get things working
with Access. We aren't teaching SQL, we are trying to let people get to
their data.

btw, I *was* suprised to see the "IS UNKNOWN" construct. It's been
lurking in my reference books for years. afaict it is very uncommonly
used, since most RDBMSes accept IS NULL as an equivalent. But is is
trivial to add to the grammar.

- Thomas

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#10)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joe@conway-family.com> writes:

We do not currently have correct implementations of IS TRUE, IS FALSE,
or IS UNKNOWN (IS TRUE/FALSE are in there but give the wrong result
for null inputs). This is on my to-do list to fix; not sure if the
master TODO list mentions it or not. Actually it'd be a good project
for a newbie hacker who wants to learn about the backend's
expression-handling machinery. Anyone want to take it on?

I'd like to finish up the has_table_privilege function over the next week or
so and then take this on. Can you point me in a direction to start looking?

The way things currently work is that gram.y translates "x IS TRUE" etc
to "x = true" etc. This is wrong because it does the wrong thing for
null input. Another objection is that it's impossible for ruleutils.c
to reverse-list the expression tree in its original form.

IS [NOT] NULL is handled a little differently: gram.y generates a
specialized Expr node, which parse_expr.c translates to a function call
on the specialized functions nullvalue() and nonnullvalue()
respectively. I don't much care for this implementation either, again
partly because ruleutils.c has to be uglified to deal with it, but
partly because the optimizer can't cheaply recognize IS NULL tests
either.

I'd like to see all eight of these guys translated into a specialized
kind of expression node, called perhaps BooleanTest. Actually, it'd
probably be wise to keep IS NULL separate from the six boolean tests,
with an eye to the future when it will need to support nonscalar
arguments. So maybe BooleanTest and NullTest node types, each with a
field showing exactly which test is wanted.

Adding a new expression node type is a straightforward but rather
tedious exercise in teaching some dozens of places what to do with it.
A grep for existing expression node types, such as CaseExpr or
FieldSelect or RelabelType, will give you a good idea what needs to be
done.

regards, tom lane

#13Thomas Lockhart
lockhart@fourpalms.org
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

The way things currently work is that gram.y translates "x IS TRUE" etc
to "x = true" etc. This is wrong because it does the wrong thing for
null input. Another objection is that it's impossible for ruleutils.c
to reverse-list the expression tree in its original form.

fyi, in case it helps: we used to have gram.y translate these into
function calls, rather than the operator expression. But that precluded
the optimizer from ever having a shot at optimizing out "const = const"
kinds of expressions and other fluff.

If we go to a specialized node in the parse tree, then the optimizer
could be taught to handle that, which is better than the original
straight function call which would have masked things too much.

- Thomas

#14Tom Ivar Helbekkmo
tih@kpnQwest.no
In reply to: Thomas Lockhart (#11)
Re: Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Thomas Lockhart <lockhart@fourpalms.org> writes:

But since the construct is not allowed (or useless), why would
anyone feel they need to use it?

Because it isn't entirely useless, actually. I agree that no
programmer in his right mind would write, by hand, a comparison
involving NULL, knowing that the truth value of that comparison is
required by the standard to be UNKNOWN (i.e. NULL). However, I'm
looking at using machine generated SQL code (generated on the fly in
an interactive application) to implement a dynamically adapting set of
tables, rules/triggers and their supporting stored procedures, and
it's just a matter of time before the first "= NULL" happens to show
up in code generated like that. I'd like it to behave according to
the standard when that situation occurs, and the standard says that
any comparison with NULL, even "NULL = NULL", is UNKNOWN.

-tih
--
The basic difference is this: hackers build things, crackers break them.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#11)
Re: Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Thomas Lockhart <lockhart@fourpalms.org> writes:

Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists."

How can you argue that, when the topic comes up again every couple of
months?

regards, tom lane

#16Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Zeugswetter Andreas SB (#1)
Re: Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Clearly it is not the case that "this kluge surprises everyone except
those who already know it exists."

How can you argue that, when the topic comes up again every couple of
months?

I'm not looking for an argument. But the statement is not factually
true: "suprises everyone" (most folks don't notice, and don't care much)
and "every couple of months" (??) stray far enough from facts that we
should get back on topic. Whatever the facts, the messages that we do
*not* get are just as significant: a relatively large portion of
feedback from users of both Access and PostgreSQL at the time the
"feature" was added indicated that it was a significant stumbling block
for those users, and those messages will start up anew without adequate
planning and implementation.

Since back then, we have some additional clarification that it occurs
only for users of Access/Forms, and that others are worried that it will
lead to difficulties for others under different circumstances (please
remember, or note if it is too long ago, that at the time I argued
against adding the "feature", but tried to listen to those who would
find it useful). I'm not ignoring those worries, but in the battle
between purity and usefulness we shouldn't always line up with the
strongest or loudest voice(s) that day, but try to respect and keep in
mind the others who have contributed to the discussion in the past.

That said, the issues should be broken down into managable chunks, but
imho forcing the last step (removal of the "= NULL" accomodation) first
is premature. How about phasing this so that we can accomodate the
long-standing issue of M$ interoperability (via ODBC improvements to
make it transparent) before ripping out the current workaround?

From Andreas' comments, it seems that for his application he would like
a different behavior, but frankly I'm not certain why the current
behavior would be detrimental in the use case he mentioned. If SQL92
requires that any query with "= NULL" be rejected as illegal (my books
aren't nearby at the moment), he might still want to have code at the
application level for some graceful handling of illegal values.

- Thomas

#17Joe Conway
joseph.conway@home.com
In reply to: Zeugswetter Andreas SB (#1)
1 attachment(s)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

IS [NOT] NULL is handled a little differently: gram.y generates a
specialized Expr node, which parse_expr.c translates to a function call
on the specialized functions nullvalue() and nonnullvalue()
respectively. I don't much care for this implementation either, again
partly because ruleutils.c has to be uglified to deal with it, but
partly because the optimizer can't cheaply recognize IS NULL tests
either.

I'd like to see all eight of these guys translated into a specialized
kind of expression node, called perhaps BooleanTest. Actually, it'd
probably be wise to keep IS NULL separate from the six boolean tests,
with an eye to the future when it will need to support nonscalar
arguments. So maybe BooleanTest and NullTest node types, each with a
field showing exactly which test is wanted.

Attached is a patch for a new NullTest node type for review and comment.
Since it didn't seem like there was consensus regarding removal of the "a =
null" conversion to "a is null" behavior, I left it in. It is worth
mentioning, however, that neither Oracle 8.1.6 or MSSQL 7 seem to support
this -- see below:

Oracle:
****************************************
SQL> select f1,f2 from foo where f2 = null;

no rows selected

MSSQL 7
****************************************
select f1,f2 from foo where f2 = null
f1 f2
----------- --------------------------------------------------

(0 row(s) affected)

PostgreSQL
****************************************
test=# select f1,f2 from foo where f2 = null;
f1 | f2
----+----
1 |
4 |
(2 rows)

In all 3 cases table foo has 4 rows, 2 of which have null values for f2.
Based on this, should support for the converting "a = null" to "a is null"
be dropped?

I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?

Thanks,

-- Joe

Attachments:

nulltest01.diffapplication/octet-stream; name=nulltest01.diffDownload
diff -Naur pgsql.virg/src/backend/executor/execQual.c pgsql.dev/src/backend/executor/execQual.c
--- pgsql.virg/src/backend/executor/execQual.c	Sat Jun 16 18:56:42 2001
+++ pgsql.dev/src/backend/executor/execQual.c	Mon Jun 18 06:15:59 2001
@@ -62,7 +62,8 @@
 static Datum ExecEvalOr(Expr *orExpr, ExprContext *econtext, bool *isNull);
 static Datum ExecEvalCase(CaseExpr *caseExpr, ExprContext *econtext,
 			 bool *isNull, ExprDoneCond *isDone);
-
+static Datum ExecEvalNullTest(NullTest *ntest, ExprContext *econtext,
+			 bool *isNull, ExprDoneCond *isDone);
 
 /*----------
  *	  ExecEvalArrayRef
@@ -1092,6 +1093,46 @@
 }
 
 /* ----------------------------------------------------------------
+ *		ExecEvalNullTest
+ *
+ *		Evaluate a NullTest node.
+ * ----------------------------------------------------------------
+ */
+static Datum
+ExecEvalNullTest(NullTest *ntest,
+					ExprContext *econtext,
+					bool *isNull,
+					ExprDoneCond *isDone)
+{
+	Datum		result;
+
+	result = ExecEvalExpr(ntest->arg, econtext, isNull, isDone);
+	if (*isNull && ntest->nulltesttype == IS_NULL)
+	{
+		*isNull = FALSE;
+		return BoolGetDatum(TRUE);
+	}
+	else if (*isNull && ntest->nulltesttype == IS_NOT_NULL)
+	{
+		*isNull = FALSE;
+		return BoolGetDatum(FALSE);
+	}
+	else if (!*isNull && ntest->nulltesttype == IS_NULL)
+	{
+		return BoolGetDatum(FALSE);
+	}
+	else if (!*isNull && ntest->nulltesttype == IS_NOT_NULL)
+	{
+		return BoolGetDatum(TRUE);
+	}
+	else									/* should never get here */ 
+	{
+		elog(ERROR, "NullTest node: something unexpected happened!");
+		return BoolGetDatum(NULL);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecEvalFieldSelect
  *
  *		Evaluate a FieldSelect node.
@@ -1262,6 +1303,12 @@
 			break;
 		case T_CaseExpr:
 			retDatum = ExecEvalCase((CaseExpr *) expression,
+									econtext,
+									isNull,
+									isDone);
+			break;
+		case T_NullTest:
+			retDatum = ExecEvalNullTest((NullTest *) expression,
 									econtext,
 									isNull,
 									isDone);
diff -Naur pgsql.virg/src/backend/nodes/copyfuncs.c pgsql.dev/src/backend/nodes/copyfuncs.c
--- pgsql.virg/src/backend/nodes/copyfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/copyfuncs.c	Mon Jun 18 05:11:36 2001
@@ -1017,6 +1017,24 @@
 	return newnode;
 }
 
+/* ----------------
+ *		_copyNullTest
+ * ----------------
+ */
+static NullTest *
+_copyNullTest(NullTest *from)
+{
+	NullTest *newnode = makeNode(NullTest);
+
+	/*
+	 * copy remainder of node
+	 */
+	Node_Copy(from, newnode, arg);
+	newnode->nulltesttype = from->nulltesttype;
+
+	return newnode;
+}
+
 static ArrayRef *
 _copyArrayRef(ArrayRef *from)
 {
@@ -2953,6 +2971,9 @@
 			break;
 		case T_CaseWhen:
 			retval = _copyCaseWhen(from);
+			break;
+		case T_NullTest:
+			retval = _copyNullTest(from);
 			break;
 		case T_FkConstraint:
 			retval = _copyFkConstraint(from);
diff -Naur pgsql.virg/src/backend/nodes/equalfuncs.c pgsql.dev/src/backend/nodes/equalfuncs.c
--- pgsql.virg/src/backend/nodes/equalfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/equalfuncs.c	Mon Jun 18 04:57:58 2001
@@ -1712,6 +1712,16 @@
 	return true;
 }
 
+static bool
+_equalNullTest(NullTest *a, NullTest *b)
+{
+	if (!equal(a->arg, b->arg))
+		return false;
+	if (a->nulltesttype != b->nulltesttype)
+		return false;
+	return true;
+}
+
 /*
  * Stuff from pg_list.h
  */
@@ -2119,6 +2129,9 @@
 			break;
 		case T_CaseWhen:
 			retval = _equalCaseWhen(a, b);
+			break;
+		case T_NullTest:
+			retval = _equalNullTest(a, b);
 			break;
 		case T_FkConstraint:
 			retval = _equalFkConstraint(a, b);
diff -Naur pgsql.virg/src/backend/nodes/outfuncs.c pgsql.dev/src/backend/nodes/outfuncs.c
--- pgsql.virg/src/backend/nodes/outfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/outfuncs.c	Mon Jun 18 04:55:49 2001
@@ -1403,6 +1403,19 @@
 }
 
 /*
+ *	NullTest
+ */
+static void
+_outNullTest(StringInfo str, NullTest *node)
+{
+	appendStringInfo(str, " NULLTEST :arg ");
+	_outNode(str, node->arg);
+
+	appendStringInfo(str, " :nulltesttype %d ",
+				   node->nulltesttype);
+}
+
+/*
  * _outNode -
  *	  converts a Node into ascii string and append it to 'str'
  */
@@ -1639,7 +1652,9 @@
 			case T_CaseWhen:
 				_outCaseWhen(str, obj);
 				break;
-
+			case T_NullTest:
+				_outNullTest(str, obj);
+				break;
 			case T_VariableSetStmt:
 				break;
 			case T_SelectStmt:
diff -Naur pgsql.virg/src/backend/nodes/readfuncs.c pgsql.dev/src/backend/nodes/readfuncs.c
--- pgsql.virg/src/backend/nodes/readfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/readfuncs.c	Mon Jun 18 04:54:06 2001
@@ -860,6 +860,31 @@
 }
 
 /* ----------------
+ *		_readNullTest
+ *
+ *	NullTest is a subclass of Node
+ * ----------------
+ */
+static NullTest *
+_readNullTest(void)
+{
+	NullTest	*local_node;
+	char		*token;
+	int			length;
+
+	local_node = makeNode(NullTest);
+
+	token = pg_strtok(&length); /* eat :arg */
+	local_node->arg = nodeRead(true);	/* now read it */
+
+	token = pg_strtok(&length); /* eat :nulltesttype */
+	token = pg_strtok(&length); /* get nulltesttype */
+	local_node->nulltesttype = (NullTestType) atoi(token);
+
+	return local_node;
+}
+
+/* ----------------
  *		_readVar
  *
  *	Var is a subclass of Expr
@@ -1966,6 +1991,8 @@
 		return_value = _readCaseExpr();
 	else if (length == 4 && strncmp(token, "WHEN", length) == 0)
 		return_value = _readCaseWhen();
+	else if (length == 8 && strncmp(token, "NULLTEST", length) == 0)
+		return_value = _readNullTest();
 	else
 		elog(ERROR, "badly formatted planstring \"%.10s\"...", token);
 
diff -Naur pgsql.virg/src/backend/optimizer/util/clauses.c pgsql.dev/src/backend/optimizer/util/clauses.c
--- pgsql.virg/src/backend/optimizer/util/clauses.c	Sat Jun 16 18:56:46 2001
+++ pgsql.dev/src/backend/optimizer/util/clauses.c	Mon Jun 18 04:50:10 2001
@@ -1580,6 +1580,8 @@
 					return true;
 			}
 			break;
+		case T_NullTest:
+			return walker(((NullTest *) node)->arg, context);
 		case T_SubLink:
 			{
 				SubLink    *sublink = (SubLink *) node;
@@ -1930,6 +1932,16 @@
 				FLATCOPY(newnode, casewhen, CaseWhen);
 				MUTATE(newnode->expr, casewhen->expr, Node *);
 				MUTATE(newnode->result, casewhen->result, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest *ntest = (NullTest *) node;
+				NullTest *newnode;
+
+				FLATCOPY(newnode, ntest, NullTest);
+				MUTATE(newnode->arg, ntest->arg, Node *);
 				return (Node *) newnode;
 			}
 			break;
diff -Naur pgsql.virg/src/backend/parser/gram.y pgsql.dev/src/backend/parser/gram.y
--- pgsql.virg/src/backend/parser/gram.y	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/gram.y	Mon Jun 18 05:25:02 2001
@@ -4434,9 +4434,19 @@
 					 * (like Microsoft's).  Turn these into IS NULL exprs.
 					 */
 					if (exprIsNullConstant($3))
-						$$ = makeA_Expr(ISNULL, NULL, $1, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $1;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else if (exprIsNullConstant($1))
-						$$ = makeA_Expr(ISNULL, NULL, $3, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $3;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else
 						$$ = makeA_Expr(OP, "=", $1, $3);
 				}
@@ -4499,15 +4509,43 @@
 					n->agg_distinct = FALSE;
 					$$ = makeA_Expr(OP, "!~~*", $1, (Node *) n);
 				}
-
+		/* NullTest clause
+		 * Define SQL92-style Null test clause.
+		 * Allow all two forms described in the standard:
+		 *  a IS NULL
+		 *  a IS NOT NULL
+		 * Allow two SQL extensions
+		 *  a ISNULL
+		 *  a NOTNULL
+		 */
 		| a_expr ISNULL
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NULL_P
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr NOTNULL
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NOT NULL_P
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		/* IS TRUE, IS FALSE, etc used to be function calls
 		 *  but let's make them expressions to allow the optimizer
 		 *  a chance to eliminate them if a_expr is a constant string.
diff -Naur pgsql.virg/src/backend/parser/parse_expr.c pgsql.dev/src/backend/parser/parse_expr.c
--- pgsql.virg/src/backend/parser/parse_expr.c	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/parse_expr.c	Mon Jun 18 05:10:49 2001
@@ -510,6 +510,16 @@
 				break;
 			}
 
+		case T_NullTest:
+			{
+				NullTest   *n = (NullTest *) expr;
+
+				n->arg = transformExpr(pstate, n->arg, precedence);
+
+				result = (Node *) expr;
+				break;
+			}
+
 			/*
 			 * Quietly accept node types that may be presented when we are
 			 * called on an already-transformed tree.
@@ -668,6 +678,9 @@
 			break;
 		case T_CaseWhen:
 			type = exprType(((CaseWhen *) expr)->result);
+			break;
+		case T_NullTest:
+			type = BOOLOID;
 			break;
 		case T_Ident:
 			/* is this right? */
diff -Naur pgsql.virg/src/backend/utils/adt/ruleutils.c pgsql.dev/src/backend/utils/adt/ruleutils.c
--- pgsql.virg/src/backend/utils/adt/ruleutils.c	Sat Jun 16 18:56:48 2001
+++ pgsql.dev/src/backend/utils/adt/ruleutils.c	Sat Jun 16 23:36:15 2001
@@ -1891,6 +1891,21 @@
 			}
 			break;
 
+		case T_NullTest:
+			{
+				NullTest		*ntest = (NullTest *) node;
+
+				get_rule_expr(ntest->arg, context);
+				if (ntest->nulltesttype == IS_NULL)
+					appendStringInfo(buf, "%s", " IS NULL ");
+				else if (ntest->nulltesttype == IS_NOT_NULL)
+					appendStringInfo(buf, "%s", " IS NOT NULL ");
+				else	/* should never happen */
+					elog(ERROR, "invalid NullTest type %d",
+						 ntest->nulltesttype);
+			}
+			break;
+
 		case T_RelabelType:
 			{
 				RelabelType *relabel = (RelabelType *) node;
diff -Naur pgsql.virg/src/include/nodes/nodes.h pgsql.dev/src/include/nodes/nodes.h
--- pgsql.virg/src/include/nodes/nodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/nodes.h	Mon Jun 18 04:35:25 2001
@@ -225,6 +225,7 @@
 	T_RowMarkXXX,				/* not used anymore; tag# available */
 	T_FkConstraint,
 	T_PrivGrantee,
+	T_NullTest,
 
 	/*
 	 * TAGS FOR FUNCTION-CALL CONTEXT AND RESULTINFO NODES (see fmgr.h)
diff -Naur pgsql.virg/src/include/nodes/parsenodes.h pgsql.dev/src/include/nodes/parsenodes.h
--- pgsql.virg/src/include/nodes/parsenodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/parsenodes.h	Mon Jun 18 04:39:00 2001
@@ -1054,6 +1054,28 @@
 	Node	   *result;			/* substitution result */
 } CaseWhen;
 
+/* ----------------
+ * NullTest
+ *
+ * NullTest represents the operation of comparing a value to a NULL
+ * value.  At runtime, the input expression is expected to be IS NULL,
+ * IS NOT NULL, ISNULL, or NOTNULL.
+ * The appropriate test is performed and returned as a Datum.
+ * ----------------
+ */
+
+typedef enum NullTestType
+{
+	IS_NULL, IS_NOT_NULL
+} NullTestType;
+
+typedef struct NullTest
+{
+	NodeTag			type;
+	Node			*arg;			/* input expression */
+	NullTestType	nulltesttype;	/* IS NULL, IS NOT NULL */
+} NullTest;
+
 /*
  * ColumnDef - column definition (used in various creates)
  *
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#17)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

Attached is a patch for a new NullTest node type for review and comment.

I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...

Based on this, should support for the converting "a = null" to "a is null"
be dropped?

My opinion on that is already on record ;-)

I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?

Curious; I'd have said that that is clearly within the spec. Can anyone
check it on some other engines?

regards, tom lane

#19Joe Conway
joseph.conway@home.com
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...

OK -- I'll get started on BooleanTest in the next day or two.

-- Joe

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#17)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

I also noticed that in PostgreSQL I can do the following (both before and
after this patch):
select f2 is null from foo;
whereas in both Oracle and MSSQL it causes a syntax error. Any thoughts on
this?

I dug into this further and discovered that indeed it is not SQL92
... but it is SQL99. Amazingly enough, SQL92 doesn't allow boolean
expressions as a possible type of general expression:

<value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>

It only allows them as <search condition>s, which is to say WHERE,
HAVING, CASE WHEN, CHECK, and one or two other places.

But SQL99 gets it right:

<value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>
| <boolean value expression>
| <user-defined type value expression>
| <row value expression>
| <reference value expression>
| <collection value expression>

Looks like we're ahead of the curve here...

regards, tom lane

#21Joe Conway
joseph.conway@home.com
In reply to: Zeugswetter Andreas SB (#1)
1 attachment(s)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

I assume you are just looking for review at this point; I would not
recommend applying to CVS until the BooleanTest part is done too.
(Since parsetree changes affect stored rules, the change really should
include a catversion.h increment, and thus it's best to bunch this sort
of change together to avoid forcing extra initdbs on other hackers.)
I'll look through the code later, but...

OK -- here's the full patch for review. It includes two new expression node
types: NullTest and BooleanTest. I have a couple of questions/comments
remaining WRT this:

-- Should I increment catversion.h as part of the patch (I didn't in this
patch), or is that usually centrally controlled by Bruce (or whomever
commits the change)?

-- IMHO, if we are going to keep the (a = null) to (a is null) conversion,
then there should also be a similar conversion from (a != null) to (a is not
null). Otherwise the two operations which may be expected to be
complimentary (as evidenced by at least one recent post) are not.

-- If I have interpreted SQL92 correctly UNKNOWN IS TRUE should return
FALSE, and UNKNOWN IS NOT TRUE is equivalent to NOT (UNKNOWN IS TRUE) ==>
TRUE. Is this correct?

Thanks,

-- Joe

Attachments:

newnodes01.diffapplication/octet-stream; name=newnodes01.diffDownload
diff -Naur pgsql.virg/src/backend/executor/execQual.c pgsql.dev/src/backend/executor/execQual.c
--- pgsql.virg/src/backend/executor/execQual.c	Sat Jun 16 18:56:42 2001
+++ pgsql.dev/src/backend/executor/execQual.c	Tue Jun 19 05:26:22 2001
@@ -62,7 +62,10 @@
 static Datum ExecEvalOr(Expr *orExpr, ExprContext *econtext, bool *isNull);
 static Datum ExecEvalCase(CaseExpr *caseExpr, ExprContext *econtext,
 			 bool *isNull, ExprDoneCond *isDone);
-
+static Datum ExecEvalNullTest(NullTest *ntest, ExprContext *econtext,
+			 bool *isNull, ExprDoneCond *isDone);
+static Datum ExecEvalBooleanTest(BooleanTest *btest, ExprContext *econtext,
+			 bool *isNull, ExprDoneCond *isDone);
 
 /*----------
  *	  ExecEvalArrayRef
@@ -1092,6 +1095,110 @@
 }
 
 /* ----------------------------------------------------------------
+ *		ExecEvalNullTest
+ *
+ *		Evaluate a NullTest node.
+ * ----------------------------------------------------------------
+ */
+static Datum
+ExecEvalNullTest(NullTest *ntest,
+					ExprContext *econtext,
+					bool *isNull,
+					ExprDoneCond *isDone)
+{
+	Datum		result;
+
+	result = ExecEvalExpr(ntest->arg, econtext, isNull, isDone);
+    switch (ntest->nulltesttype)
+    {
+        case IS_NULL:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(true);
+            }
+            else
+                return BoolGetDatum(false);
+        case IS_NOT_NULL:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(false);
+            }
+            else
+                return BoolGetDatum(true);
+        default:
+            elog(ERROR, "ExecEvalNullTest: unexpected nulltesttype %d",
+                 (int) ntest->nulltesttype);
+            return (Datum) 0;  /* keep compiler quiet */
+    }
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEvalBooleanTest
+ *
+ *		Evaluate a BooleanTest node.
+ * ----------------------------------------------------------------
+ */
+static Datum
+ExecEvalBooleanTest(BooleanTest *btest,
+					ExprContext *econtext,
+					bool *isNull,
+					ExprDoneCond *isDone)
+{
+	Datum		result;
+
+	result = ExecEvalExpr(btest->arg, econtext, isNull, isDone);
+    switch (btest->booltesttype)
+    {
+        case IS_TRUE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(false);
+            }
+            else if (result == true)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        case IS_NOT_FALSE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(true);
+            }
+            else if (result == true)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        case IS_FALSE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(false);
+            }
+            else if (result == false)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        case IS_NOT_TRUE:
+            if (*isNull)
+            {
+                *isNull = false;
+                return BoolGetDatum(true);
+            }
+            else if (result == false)
+                return BoolGetDatum(true);
+			else
+                return BoolGetDatum(false);
+        default:
+            elog(ERROR, "ExecEvalBooleanTest: unexpected booltesttype %d",
+                 (int) btest->booltesttype);
+            return (Datum) 0;  /* keep compiler quiet */
+    }
+}
+
+/* ----------------------------------------------------------------
  *		ExecEvalFieldSelect
  *
  *		Evaluate a FieldSelect node.
@@ -1262,6 +1369,18 @@
 			break;
 		case T_CaseExpr:
 			retDatum = ExecEvalCase((CaseExpr *) expression,
+									econtext,
+									isNull,
+									isDone);
+			break;
+		case T_NullTest:
+			retDatum = ExecEvalNullTest((NullTest *) expression,
+									econtext,
+									isNull,
+									isDone);
+			break;
+		case T_BooleanTest:
+			retDatum = ExecEvalBooleanTest((BooleanTest *) expression,
 									econtext,
 									isNull,
 									isDone);
diff -Naur pgsql.virg/src/backend/nodes/copyfuncs.c pgsql.dev/src/backend/nodes/copyfuncs.c
--- pgsql.virg/src/backend/nodes/copyfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/copyfuncs.c	Tue Jun 19 05:05:22 2001
@@ -1017,6 +1017,42 @@
 	return newnode;
 }
 
+/* ----------------
+ *		_copyNullTest
+ * ----------------
+ */
+static NullTest *
+_copyNullTest(NullTest *from)
+{
+	NullTest *newnode = makeNode(NullTest);
+
+	/*
+	 * copy remainder of node
+	 */
+	Node_Copy(from, newnode, arg);
+	newnode->nulltesttype = from->nulltesttype;
+
+	return newnode;
+}
+
+/* ----------------
+ *		_copyBooleanTest
+ * ----------------
+ */
+static BooleanTest *
+_copyBooleanTest(BooleanTest *from)
+{
+	BooleanTest *newnode = makeNode(BooleanTest);
+
+	/*
+	 * copy remainder of node
+	 */
+	Node_Copy(from, newnode, arg);
+	newnode->booltesttype = from->booltesttype;
+
+	return newnode;
+}
+
 static ArrayRef *
 _copyArrayRef(ArrayRef *from)
 {
@@ -2953,6 +2989,12 @@
 			break;
 		case T_CaseWhen:
 			retval = _copyCaseWhen(from);
+			break;
+		case T_NullTest:
+			retval = _copyNullTest(from);
+			break;
+		case T_BooleanTest:
+			retval = _copyBooleanTest(from);
 			break;
 		case T_FkConstraint:
 			retval = _copyFkConstraint(from);
diff -Naur pgsql.virg/src/backend/nodes/equalfuncs.c pgsql.dev/src/backend/nodes/equalfuncs.c
--- pgsql.virg/src/backend/nodes/equalfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/equalfuncs.c	Tue Jun 19 05:12:10 2001
@@ -1712,6 +1712,26 @@
 	return true;
 }
 
+static bool
+_equalNullTest(NullTest *a, NullTest *b)
+{
+	if (!equal(a->arg, b->arg))
+		return false;
+	if (a->nulltesttype != b->nulltesttype)
+		return false;
+	return true;
+}
+
+static bool
+_equalBooleanTest(BooleanTest *a, BooleanTest *b)
+{
+	if (!equal(a->arg, b->arg))
+		return false;
+	if (a->booltesttype != b->booltesttype)
+		return false;
+	return true;
+}
+
 /*
  * Stuff from pg_list.h
  */
@@ -2119,6 +2139,12 @@
 			break;
 		case T_CaseWhen:
 			retval = _equalCaseWhen(a, b);
+			break;
+		case T_NullTest:
+			retval = _equalNullTest(a, b);
+			break;
+		case T_BooleanTest:
+			retval = _equalBooleanTest(a, b);
 			break;
 		case T_FkConstraint:
 			retval = _equalFkConstraint(a, b);
diff -Naur pgsql.virg/src/backend/nodes/outfuncs.c pgsql.dev/src/backend/nodes/outfuncs.c
--- pgsql.virg/src/backend/nodes/outfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/outfuncs.c	Tue Jun 19 05:11:03 2001
@@ -1403,6 +1403,32 @@
 }
 
 /*
+ *	NullTest
+ */
+static void
+_outNullTest(StringInfo str, NullTest *node)
+{
+	appendStringInfo(str, " NULLTEST :arg ");
+	_outNode(str, node->arg);
+
+	appendStringInfo(str, " :nulltesttype %d ",
+				   node->nulltesttype);
+}
+
+/*
+ *	BooleanTest
+ */
+static void
+_outBooleanTest(StringInfo str, BooleanTest *node)
+{
+	appendStringInfo(str, " BOOLEANTEST :arg ");
+	_outNode(str, node->arg);
+
+	appendStringInfo(str, " :booltesttype %d ",
+				   node->booltesttype);
+}
+
+/*
  * _outNode -
  *	  converts a Node into ascii string and append it to 'str'
  */
@@ -1639,7 +1665,12 @@
 			case T_CaseWhen:
 				_outCaseWhen(str, obj);
 				break;
-
+			case T_NullTest:
+				_outNullTest(str, obj);
+				break;
+			case T_BooleanTest:
+				_outBooleanTest(str, obj);
+				break;
 			case T_VariableSetStmt:
 				break;
 			case T_SelectStmt:
diff -Naur pgsql.virg/src/backend/nodes/readfuncs.c pgsql.dev/src/backend/nodes/readfuncs.c
--- pgsql.virg/src/backend/nodes/readfuncs.c	Sat Jun 16 18:56:45 2001
+++ pgsql.dev/src/backend/nodes/readfuncs.c	Tue Jun 19 05:27:26 2001
@@ -860,6 +860,56 @@
 }
 
 /* ----------------
+ *		_readNullTest
+ *
+ *	NullTest is a subclass of Node
+ * ----------------
+ */
+static NullTest *
+_readNullTest(void)
+{
+	NullTest	*local_node;
+	char		*token;
+	int			length;
+
+	local_node = makeNode(NullTest);
+
+	token = pg_strtok(&length); /* eat :arg */
+	local_node->arg = nodeRead(true);	/* now read it */
+
+	token = pg_strtok(&length); /* eat :nulltesttype */
+	token = pg_strtok(&length); /* get nulltesttype */
+	local_node->nulltesttype = (NullTestType) atoi(token);
+
+	return local_node;
+}
+
+/* ----------------
+ *		_readBooleanTest
+ *
+ *	BooleanTest is a subclass of Node
+ * ----------------
+ */
+static BooleanTest *
+_readBooleanTest(void)
+{
+	BooleanTest	*local_node;
+	char		*token;
+	int			length;
+
+	local_node = makeNode(BooleanTest);
+
+	token = pg_strtok(&length); /* eat :arg */
+	local_node->arg = nodeRead(true);	/* now read it */
+
+	token = pg_strtok(&length); /* eat :booltesttype */
+	token = pg_strtok(&length); /* get booltesttype */
+	local_node->booltesttype = (BoolTestType) atoi(token);
+
+	return local_node;
+}
+
+/* ----------------
  *		_readVar
  *
  *	Var is a subclass of Expr
@@ -1966,6 +2016,10 @@
 		return_value = _readCaseExpr();
 	else if (length == 4 && strncmp(token, "WHEN", length) == 0)
 		return_value = _readCaseWhen();
+	else if (length == 8 && strncmp(token, "NULLTEST", length) == 0)
+		return_value = _readNullTest();
+	else if (length == 11 && strncmp(token, "BOOLEANTEST", length) == 0)
+		return_value = _readBooleanTest();
 	else
 		elog(ERROR, "badly formatted planstring \"%.10s\"...", token);
 
diff -Naur pgsql.virg/src/backend/optimizer/util/clauses.c pgsql.dev/src/backend/optimizer/util/clauses.c
--- pgsql.virg/src/backend/optimizer/util/clauses.c	Sat Jun 16 18:56:46 2001
+++ pgsql.dev/src/backend/optimizer/util/clauses.c	Tue Jun 19 05:02:53 2001
@@ -1580,6 +1580,10 @@
 					return true;
 			}
 			break;
+		case T_NullTest:
+			return walker(((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return walker(((BooleanTest *) node)->arg, context);
 		case T_SubLink:
 			{
 				SubLink    *sublink = (SubLink *) node;
@@ -1930,6 +1934,26 @@
 				FLATCOPY(newnode, casewhen, CaseWhen);
 				MUTATE(newnode->expr, casewhen->expr, Node *);
 				MUTATE(newnode->result, casewhen->result, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_NullTest:
+			{
+				NullTest *ntest = (NullTest *) node;
+				NullTest *newnode;
+
+				FLATCOPY(newnode, ntest, NullTest);
+				MUTATE(newnode->arg, ntest->arg, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_BooleanTest:
+			{
+				BooleanTest *btest = (BooleanTest *) node;
+				BooleanTest *newnode;
+
+				FLATCOPY(newnode, btest, BooleanTest);
+				MUTATE(newnode->arg, btest->arg, Node *);
 				return (Node *) newnode;
 			}
 			break;
diff -Naur pgsql.virg/src/backend/parser/gram.y pgsql.dev/src/backend/parser/gram.y
--- pgsql.virg/src/backend/parser/gram.y	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/gram.y	Tue Jun 19 04:46:06 2001
@@ -4434,9 +4434,19 @@
 					 * (like Microsoft's).  Turn these into IS NULL exprs.
 					 */
 					if (exprIsNullConstant($3))
-						$$ = makeA_Expr(ISNULL, NULL, $1, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $1;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else if (exprIsNullConstant($1))
-						$$ = makeA_Expr(ISNULL, NULL, $3, NULL);
+						{
+							NullTest *n = makeNode(NullTest);
+							n->arg = $3;
+							n->nulltesttype = IS_NULL;
+							$$ = (Node *)n;
+						}
 					else
 						$$ = makeA_Expr(OP, "=", $1, $3);
 				}
@@ -4499,59 +4509,79 @@
 					n->agg_distinct = FALSE;
 					$$ = makeA_Expr(OP, "!~~*", $1, (Node *) n);
 				}
-
+		/* NullTest clause
+		 * Define SQL92-style Null test clause.
+		 * Allow two forms described in the standard:
+		 *  a IS NULL
+		 *  a IS NOT NULL
+		 * Allow two SQL extensions
+		 *  a ISNULL
+		 *  a NOTNULL
+		 */
 		| a_expr ISNULL
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NULL_P
-				{	$$ = makeA_Expr(ISNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr NOTNULL
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		| a_expr IS NOT NULL_P
-				{	$$ = makeA_Expr(NOTNULL, NULL, $1, NULL); }
+				{
+					NullTest *n = makeNode(NullTest);
+					n->arg = $1;
+					n->nulltesttype = IS_NOT_NULL;
+					$$ = (Node *)n;
+				}
 		/* IS TRUE, IS FALSE, etc used to be function calls
 		 *  but let's make them expressions to allow the optimizer
 		 *  a chance to eliminate them if a_expr is a constant string.
 		 * - thomas 1997-12-22
+		 *
+		 *  Created BooleanTest Node type, and changed handling
+		 *  for NULL inputs
+		 * - jec 2001-06-18
 		 */
 		| a_expr IS TRUE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "t";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_TRUE;
+					$$ = (Node *)b;
 				}
 		| a_expr IS NOT FALSE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "t";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_NOT_FALSE;
+					$$ = (Node *)b;
 				}
 		| a_expr IS FALSE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "f";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_FALSE;
+					$$ = (Node *)b;
 				}
 		| a_expr IS NOT TRUE_P
 				{
-					A_Const *n = makeNode(A_Const);
-					n->val.type = T_String;
-					n->val.val.str = "f";
-					n->typename = makeNode(TypeName);
-					n->typename->name = xlateSqlType("bool");
-					n->typename->typmod = -1;
-					$$ = makeA_Expr(OP, "=", $1,(Node *)n);
+					BooleanTest *b = makeNode(BooleanTest);
+					b->arg = $1;
+					b->booltesttype = IS_NOT_TRUE;
+					$$ = (Node *)b;
 				}
 		| a_expr BETWEEN b_expr AND b_expr			%prec BETWEEN
 				{
diff -Naur pgsql.virg/src/backend/parser/parse_expr.c pgsql.dev/src/backend/parser/parse_expr.c
--- pgsql.virg/src/backend/parser/parse_expr.c	Sat Jun 16 18:56:47 2001
+++ pgsql.dev/src/backend/parser/parse_expr.c	Tue Jun 19 05:01:12 2001
@@ -510,6 +510,26 @@
 				break;
 			}
 
+		case T_NullTest:
+			{
+				NullTest   *n = (NullTest *) expr;
+
+				n->arg = transformExpr(pstate, n->arg, precedence);
+
+				result = (Node *) expr;
+				break;
+			}
+
+		case T_BooleanTest:
+			{
+				BooleanTest   *b = (BooleanTest *) expr;
+
+				b->arg = transformExpr(pstate, b->arg, precedence);
+
+				result = (Node *) expr;
+				break;
+			}
+
 			/*
 			 * Quietly accept node types that may be presented when we are
 			 * called on an already-transformed tree.
@@ -668,6 +688,12 @@
 			break;
 		case T_CaseWhen:
 			type = exprType(((CaseWhen *) expr)->result);
+			break;
+		case T_NullTest:
+			type = BOOLOID;
+			break;
+		case T_BooleanTest:
+			type = BOOLOID;
 			break;
 		case T_Ident:
 			/* is this right? */
diff -Naur pgsql.virg/src/backend/utils/adt/ruleutils.c pgsql.dev/src/backend/utils/adt/ruleutils.c
--- pgsql.virg/src/backend/utils/adt/ruleutils.c	Sat Jun 16 18:56:48 2001
+++ pgsql.dev/src/backend/utils/adt/ruleutils.c	Tue Jun 19 04:59:46 2001
@@ -1948,6 +1948,52 @@
 			}
 			break;
 
+		case T_NullTest:
+			{
+				NullTest		*ntest = (NullTest *) node;
+
+				get_rule_expr(ntest->arg, context);
+			    switch (ntest->nulltesttype)
+			    {
+			        case IS_NULL:
+						appendStringInfo(buf, "%s", " IS NULL ");
+						break;
+			        case IS_NOT_NULL:
+						appendStringInfo(buf, "%s", " IS NOT NULL ");
+						break;
+			        default:
+			            elog(ERROR, "get_rule_expr: unexpected nulltesttype %d",
+			                 (int) ntest->nulltesttype);
+				}
+			}
+			break;
+
+		case T_BooleanTest:
+			{
+				BooleanTest		*btest = (BooleanTest *) node;
+
+				get_rule_expr(btest->arg, context);
+			    switch (btest->booltesttype)
+			    {
+			        case IS_TRUE:
+						appendStringInfo(buf, "%s", " IS TRUE ");
+						break;
+			        case IS_NOT_FALSE:
+						appendStringInfo(buf, "%s", " IS NOT FALSE ");
+						break;
+			        case IS_FALSE:
+						appendStringInfo(buf, "%s", " IS FALSE ");
+						break;
+			        case IS_NOT_TRUE:
+						appendStringInfo(buf, "%s", " IS NOT TRUE ");
+						break;
+			        default:
+			            elog(ERROR, "get_rule_expr: unexpected booltesttype %d",
+			                 (int) btest->booltesttype);
+				}
+			}
+			break;
+
 		case T_SubLink:
 			get_sublink_expr(node, context);
 			break;
diff -Naur pgsql.virg/src/include/nodes/nodes.h pgsql.dev/src/include/nodes/nodes.h
--- pgsql.virg/src/include/nodes/nodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/nodes.h	Tue Jun 19 04:46:52 2001
@@ -225,6 +225,8 @@
 	T_RowMarkXXX,				/* not used anymore; tag# available */
 	T_FkConstraint,
 	T_PrivGrantee,
+	T_NullTest,
+	T_BooleanTest,
 
 	/*
 	 * TAGS FOR FUNCTION-CALL CONTEXT AND RESULTINFO NODES (see fmgr.h)
diff -Naur pgsql.virg/src/include/nodes/parsenodes.h pgsql.dev/src/include/nodes/parsenodes.h
--- pgsql.virg/src/include/nodes/parsenodes.h	Sat Jun 16 18:56:52 2001
+++ pgsql.dev/src/include/nodes/parsenodes.h	Tue Jun 19 05:25:18 2001
@@ -1054,6 +1054,50 @@
 	Node	   *result;			/* substitution result */
 } CaseWhen;
 
+/* ----------------
+ * NullTest
+ *
+ * NullTest represents the operation of comparing a value to a NULL
+ * value.  At runtime, the input expression is expected to be IS NULL,
+ * IS NOT NULL, ISNULL, or NOTNULL.
+ * The appropriate test is performed and returned as a Datum.
+ * ----------------
+ */
+
+typedef enum NullTestType
+{
+	IS_NULL, IS_NOT_NULL
+} NullTestType;
+
+typedef struct NullTest
+{
+	NodeTag			type;
+	Node			*arg;			/* input expression */
+	NullTestType	nulltesttype;	/* IS NULL, IS NOT NULL */
+} NullTest;
+
+/* ----------------
+ * BooleanTest
+ *
+ * BooleanTest represents the operation of comparing a value to a TRUE
+ * or FALSE value.  At runtime, the input expression is expected to be
+ * IS TRUE, IS NOT FALSE, IS FALSE, or IS NOT TRUE.
+ * The appropriate test is performed and returned as a Datum.
+ * ----------------
+ */
+
+typedef enum BoolTestType
+{
+	IS_TRUE, IS_NOT_FALSE, IS_FALSE, IS_NOT_TRUE
+} BoolTestType;
+
+typedef struct BooleanTest
+{
+	NodeTag			type;
+	Node			*arg;			/* input expression */
+	BoolTestType	booltesttype;	/* IS_TRUE, IS_NOT_FALSE, IS_FALSE, IS_NOT_TRUE */
+} BooleanTest;
+
 /*
  * ColumnDef - column definition (used in various creates)
  *
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#21)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

-- Should I increment catversion.h as part of the patch (I didn't in this
patch), or is that usually centrally controlled by Bruce (or whomever
commits the change)?

It's good to put the catversion bump into the patch, else the committer
might forget to do it.

-- IMHO, if we are going to keep the (a = null) to (a is null) conversion,
then there should also be a similar conversion from (a != null) to (a is not
null). Otherwise the two operations which may be expected to be
complimentary (as evidenced by at least one recent post) are not.

I'd resist this. The only reason the =NULL hack is in there at all is
to support Access97. We shouldn't extend the deviation from standards
further than the minimum needed to do that. The hack is fundamentally
inconsistent anyway, and breaking our standards compliance further in
pursuit of bogus consistency seems misguided.

Personally I'd rather take out the =NULL conversion anyway...

-- If I have interpreted SQL92 correctly UNKNOWN IS TRUE should return
FALSE, and UNKNOWN IS NOT TRUE is equivalent to NOT (UNKNOWN IS TRUE) ==>
TRUE. Is this correct?

Yes. Table 15 is pretty illegible in the ASCII draft copies of SQL92
and SQL99, but the PDF version of SQL99 is okay, and it makes clear
what you'd expect:

input IS TRUE IS FALSE IS UNKNOWN

true true false false
false false true false
unknown false false true

and then the NOT variants are defined as

x IS NOT foo == NOT (x IS foo)

I'll try to look over and commit the patch later today.

For extra credit ;-) ... if you'd like to learn a little bit about the
optimizer, think about teaching clause_selectivity() in
optimizer/path/clausesel.c how to estimate the selectivity of these new
expression nodes. In the case where the argument is a boolean column
that we have statistics for, it should be possible to derive the correct
answer (including accounting for NULLs). If the argument is more
complex than that, you probably can't do anything really intelligent,
but you could handwave away NULLs and then compute the appropriate
function of the clause_selectivity() of the argument.

regards, tom lane

#23Joe Conway
joseph.conway@home.com
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

-- Should I increment catversion.h as part of the patch (I didn't in

this

patch), or is that usually centrally controlled by Bruce (or whomever
commits the change)?

It's good to put the catversion bump into the patch, else the committer
might forget to do it.

OK -- will do next time.

further than the minimum needed to do that. The hack is fundamentally
inconsistent anyway, and breaking our standards compliance further in
pursuit of bogus consistency seems misguided.

Personally I'd rather take out the =NULL conversion anyway...

I'd second that.

Yes. Table 15 is pretty illegible in the ASCII draft copies of SQL92
and SQL99, but the PDF version of SQL99 is okay, and it makes clear
what you'd expect:

OT -- I need to buy a copy of SQL99, but it seems to be split into several
parts (that didn't exist for SQL92). Which one (or more) are the most useful
for PostgreSQL hacking?

For extra credit ;-) ... if you'd like to learn a little bit about the
optimizer, think about teaching clause_selectivity() in
optimizer/path/clausesel.c how to estimate the selectivity of these new
expression nodes. In the case where the argument is a boolean column
that we have statistics for, it should be possible to derive the correct
answer (including accounting for NULLs). If the argument is more
complex than that, you probably can't do anything really intelligent,
but you could handwave away NULLs and then compute the appropriate
function of the clause_selectivity() of the argument.

Sure, I love a challenge ;) -- I'll take a look.

One issue I noticed this morning with this patch is that 't' and 'f' are no
longer being implicitly cast into boolean, i.e. test=# select 't' is true;
?column?
----------
f
(1 row)

test=# select 't'::bool is true;
?column?
----------
t
(1 row)

Any thoughts on where look to fix this?

Thanks,

-- Joe

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#23)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

OT -- I need to buy a copy of SQL99, but it seems to be split into several
parts (that didn't exist for SQL92). Which one (or more) are the most useful
for PostgreSQL hacking?

I find that Part 2 is almost the only one I ever look at. I'm not even
sure what's in the other parts ...

One issue I noticed this morning with this patch is that 't' and 'f' are no
longer being implicitly cast into boolean, i.e. test=# select 't' is true;
?column?
----------
f
(1 row)

Now that you mention it, it looks like all of our constructs that expect
boolean fail to coerce unknown-type literals into bools. The rest of
them raise errors, eg:

regression=# select not 't';
ERROR: argument to NOT is type 'unknown', not 'bool'

but this is pretty bogus. ISTM all these places should try to coerce
their inputs to bool before they complain. This involves calling
can_coerce_type and then coerce_type; it's tedious enough that it'd be
worth setting up a subroutine to do it. I'll add something for that,
and fix the other places too.

regards, tom lane

#25Joe Conway
joseph.conway@home.com
In reply to: Zeugswetter Andreas SB (#1)
1 attachment(s)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

For extra credit ;-) ... if you'd like to learn a little bit about the
optimizer, think about teaching clause_selectivity() in
optimizer/path/clausesel.c how to estimate the selectivity of these new
expression nodes. In the case where the argument is a boolean column
that we have statistics for, it should be possible to derive the correct
answer (including accounting for NULLs). If the argument is more
complex than that, you probably can't do anything really intelligent,
but you could handwave away NULLs and then compute the appropriate
function of the clause_selectivity() of the argument.

Attached is my initial attempt to teach clause_selectivity about BooleanTest
nodes, for review and comment. Please let me know if I'm headed in the right
direction.

Thanks,

Joe

Attachments:

booltestsel_00.diffapplication/octet-stream; name=booltestsel_00.diffDownload
diff -Naur pgsql.virg/src/backend/optimizer/path/clausesel.c pgsql.dev/src/backend/optimizer/path/clausesel.c
--- pgsql.virg/src/backend/optimizer/path/clausesel.c	Thu Jun 21 04:18:11 2001
+++ pgsql.dev/src/backend/optimizer/path/clausesel.c	Sat Jun 23 18:32:17 2001
@@ -22,6 +22,7 @@
 #include "optimizer/plancat.h"
 #include "optimizer/restrictinfo.h"
 #include "parser/parsetree.h"
+#include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 
@@ -408,6 +409,14 @@
 											 varRelid);
 			}
 		}
+	}
+	else if (IsA(clause, BooleanTest))
+	{
+		s1 = (Selectivity) DatumGetFloat8(DirectFunctionCall4(booltestsel,
+											 PointerGetDatum(root),
+											 (Datum)0,
+											 PointerGetDatum(clause),
+											 Int32GetDatum(varRelid)));
 	}
 	else if (IsA(clause, Param))
 	{
diff -Naur pgsql.virg/src/backend/utils/adt/selfuncs.c pgsql.dev/src/backend/utils/adt/selfuncs.c
--- pgsql.virg/src/backend/utils/adt/selfuncs.c	Thu Jun 21 04:18:12 2001
+++ pgsql.dev/src/backend/utils/adt/selfuncs.c	Sat Jun 23 19:01:23 2001
@@ -934,6 +934,181 @@
 }
 
 /*
+ *		booltestsel		- Selectivity of BooleanTest Node.
+ */
+Datum
+booltestsel(PG_FUNCTION_ARGS)
+{
+	Query		   *root = (Query *) PG_GETARG_POINTER(0);
+	Oid				operator = PG_GETARG_OID(1);
+	List		   *args = (List *) PG_GETARG_POINTER(2);
+	int				varRelid = PG_GETARG_INT32(3);
+	Var			   *var;
+	Node		   *clause;
+	BooleanTest	   *btest;
+	Node		   *arg;
+	Oid				relid;
+	HeapTuple		statsTuple;
+	Datum		   *values;
+	int				nvalues;
+	float4		   *numbers;
+	int				nnumbers;
+	double			selec = DEFAULT_EQ_SEL;
+
+	operator = operator;	/* keep compiler quiet */
+
+	clause = (Node *) args;
+	/*
+	 * If expression is not BooleanTest
+	 * then punt and return a default estimate.
+	 */
+	if (!IsA(clause, BooleanTest))
+		PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);
+
+	btest = (BooleanTest *) clause;
+	arg = (Node *) btest->arg;
+
+	/*
+	 * Ignore any binary-compatible relabeling
+	 */
+	if (IsA(arg, RelabelType))
+		arg = ((RelabelType *) arg)->arg;
+
+	if (IsA(arg, Var) && (varRelid == 0 || varRelid == ((Var *) arg)->varno))
+		var = (Var *) arg;
+	else	/* punt */
+		PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);
+
+	relid = getrelid(var->varno, root->rtable);
+	if (relid == InvalidOid)
+		PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);
+
+	/* get stats for the attribute, if available */
+	statsTuple = SearchSysCache(STATRELATT,
+								ObjectIdGetDatum(relid),
+								Int16GetDatum(var->varattno),
+								0, 0);
+	if (HeapTupleIsValid(statsTuple))
+	{
+		Form_pg_statistic stats;
+
+		stats = (Form_pg_statistic) GETSTRUCT(statsTuple);
+
+		/*
+		 * Var is being compared to a known non-null constant,
+		 * i.e. true, false, or unknown
+		 */
+		if (get_attstatsslot(statsTuple, var->vartype, var->vartypmod,
+							 STATISTIC_KIND_MCV, InvalidOid,
+							 &values, &nvalues,
+							 &numbers, &nnumbers))
+		{
+			bool	match = false;
+			int		i;
+
+			if (btest->booltesttype == IS_UNKNOWN || btest->booltesttype == IS_NOT_UNKNOWN)
+			{
+				switch (btest->booltesttype)
+			    {
+			        case IS_UNKNOWN:
+						/*
+						 * Only select unknown (null) values.
+						 * Use null fraction directly.
+						 */
+						selec = stats->stanullfrac;
+						break;
+			        case IS_NOT_UNKNOWN:
+						/*
+						 * Only select not unknown (not null) values.
+						 * Calculate from null fraction directly.
+						 */
+						selec = 1.0 - stats->stanullfrac;
+						break;
+			        default:
+			            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+			                 (int) btest->booltesttype);
+			            return (Datum) 0;  /* keep compiler quiet */
+				}
+			}
+			else
+			{
+				for (i = 0; i < nvalues; i++)
+				{
+					switch (btest->booltesttype)
+				    {
+				        case IS_TRUE:
+							match = (values[i] == true);
+							break;
+				        case IS_NOT_TRUE:
+							match = (values[i] == false);
+							break;
+				        case IS_FALSE:
+							match = (values[i] == false);
+							break;
+				        case IS_NOT_FALSE:
+							match = (values[i] == true);
+							break;
+				        default:
+				            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+				                 (int) btest->booltesttype);
+				            return (Datum) 0;  /* keep compiler quiet */
+					}
+					if (match == true)
+						break;
+				}
+				if (match == true)
+				{
+					/*
+					 * Constant is "=" to this common value.  We know
+					 * selectivity exactly (or as exactly as VACUUM
+					 * could calculate it, anyway).
+					 */
+					selec = numbers[i];
+				}
+				else
+				{
+					/*
+					 * Constant is not null and not "=" to this common
+					 * value. Since only true, false, and null are
+					 * possible, calculate it.
+					 */
+					selec = 1.0 - numbers[i] - stats->stanullfrac;
+				}
+			}
+		}
+		else
+		{
+			/*
+			 * no most-common-value info available
+			 */
+			values = NULL;
+			numbers = NULL;
+			nvalues = nnumbers = 0;
+		}
+
+		ReleaseSysCache(statsTuple);
+	}
+	else
+	{
+		/*
+		 * No VACUUM ANALYZE stats available, so make a guess using
+		 * the number of distinct values (2) and assuming they are
+		 * equally common.  (The guess is unlikely to be very good,
+		 * but we do know a few special cases.)
+		 */
+		selec = 0.5;
+	}
+
+	/* result should be in range, but make sure... */
+	if (selec < 0.0)
+		selec = 0.0;
+	else if (selec > 1.0)
+		selec = 1.0;
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
+
+/*
  *		eqjoinsel		- Join selectivity of "="
  */
 Datum
diff -Naur pgsql.virg/src/include/utils/builtins.h pgsql.dev/src/include/utils/builtins.h
--- pgsql.virg/src/include/utils/builtins.h	Thu Jun 21 04:18:21 2001
+++ pgsql.dev/src/include/utils/builtins.h	Sat Jun 23 15:35:02 2001
@@ -355,6 +355,7 @@
 extern Datum icregexnesel(PG_FUNCTION_ARGS);
 extern Datum nlikesel(PG_FUNCTION_ARGS);
 extern Datum icnlikesel(PG_FUNCTION_ARGS);
+extern Datum booltestsel(PG_FUNCTION_ARGS);
 
 extern Datum eqjoinsel(PG_FUNCTION_ARGS);
 extern Datum neqjoinsel(PG_FUNCTION_ARGS);
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#25)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

"Joe Conway" <joseph.conway@home.com> writes:

Attached is my initial attempt to teach clause_selectivity about BooleanTest
nodes, for review and comment. Please let me know if I'm headed in the right
direction.

Looks like you're headed the right way. A few comments ---

1. Don't forget to add a routine for NullTest nodes, too. These can be
simpler, since you know that all you care about is the stanullfrac; no
need to look at common values.

2. I don't see any real good reason to force booltestsel() (likewise
nulltestsel(), when you add it) to have the same call signature as the
various operator selectivity routines. Those have to be looked up in a
table and called via fmgr, but there's no reason to think that
booltestsel will ever be called in the same way. We could certainly
lose the useless "operator" argument. I'd be inclined to declare
booltestsel as

Selectivity booltestsel(Query *root,
BooleanTest *clause,
int varRelid);

and skip *all* the fmgr notational cruft. The only trouble with this is
we'd not want to put such a declaration into builtins.h, because it'd
force including a bunch more .h files into builtins.h, which is ugly
and might even cause some circularity problems. There are already some
selectivity-related things in builtins.h that don't really belong there.
I am starting to think that selfuncs.c deserves its own header file
separate from builtins.h, so that it doesn't clutter builtins.h with a
bunch of selectivity-specific stuff. (Comments anyone?)

3. You should be using tests like "if (DatumGetBool(values[i]))" or
"if (!DatumGetBool(values[i]))" to examine the MCV-list values, not
"if (values[i] == true)". The latter is a type cheat: it assumes that
Datum can be implicitly converted to bool, which is only sort-of true at
the moment and might be not-at-all-true in the future. You could write
"if (DatumGetBool(values[i]) == true)" and satisfy the strict-typing
part of this concern, but there's also a more stylistic thing here.
Personally I find "if (boolean == true)" or "if (boolean == false)"
to be poor style, it should be just "if (boolean)" or "if (!boolean)".
This gets back to whether you consider boolean to be a logically
distinct type from int, I suppose. Feel free to ignore that part
if you strongly disagree.

4. I don't think that DEFAULT_EQ_SEL is an appropriate default result
for this routine. Arguably, it should elog(ERROR) if not passed a
BooleanTest node, so the first one or two occurrences are nonissues
anyway. However, if you find a non-Var argument or a Var that you
can't get statistics for, you do need to return some sort of reasonable
default, and that I think is not the best. In any case the default should
take the test type into account. For IS_NULL and IS_UNKNOWN, I'd think
the default assumption should be small but not zero (0.01 maybe? Any
thoughts out there?). IS_NOT_NULL/UNKNOWN should obviously yield one
minus whatever we use for IS_NULL. It seems reasonable to use 0.5
as default for IS_TRUE and the other three BooleanTests.

5. Actually, for IS_TRUE and the other three BooleanTests, what you
should probably do with a non-Var input is ignore the possibility of a
NULL value and just try to estimate the probability of TRUE vs FALSE,
which you can do by recursing back to clause_selectivity() on the
argument (note this would be invalid for NullTest where the argument
isn't necessarily boolean, but it's legit for BooleanTests). Then
you use either that result or one minus that result, depending on
which BooleanTest you're dealing with.

6. The way you are working with the MCV values seems unnecessarily
complicated. I had a hard time seeing whether it was generating the
right answer (actually, it demonstrably isn't, since for example you
produce the same result for IS_TRUE and IS_NOT_FALSE, which ought to
differ by the stanullfrac amount; but it's too complex anyway).
I'd be inclined to do this in one of two ways:
1. Scan the MCV array to find the entry for "true", and work
with its frequency and the stanullfrac frequency to derive
the appropriate answer depending on the test type.
2. Always use the first (most common val's) frequency. This
value is either true or false, so adjust accordingly.
The simplest way would be to derive true's frequency as
either freq[0] or 1 - freq[0] - stanullfrac, and then
proceed as in #1.
#2 is a little more robust, since it would still work if for some reason
the MCV list contains only an entry for true or only an entry for false.
(I believe that could happen, if the column contains only one value and
possibly some nulls.)

You're off to an excellent start though; you seem to be finding your
way through the code very well.

regards, tom lane

#27Joe Conway
joseph.conway@home.com
In reply to: Zeugswetter Andreas SB (#1)
1 attachment(s)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards

Thanks for your thorough review and comments, Tom.

Here's a new patch for review. Summary of changes/response to earlier
comments:
- add a routine for NullTest nodes -- done.
- declare selectivity functions without fmgr notation -- done.
- create selfuncs.h for declarations -- done, but I didn't move anything
else out of builtins.h
- use DatumGetBool() and adjust style -- done
- create better default selectivities -- done:
- DEFAULT_UNK_SEL = 0.005
- DEFAULT_NOT_UNK_SEL = 1 - DEFAULT_UNK_SEL
- DEFAULT_BOOL_SEL = 0.5
- recurse clause_selectivity() for non-Var input -- done
- simplify MCV logic -- done, used 2nd approach (always use the first most
common val's frequency)

Questions:
- I added a debug define (BOOLTESTDEBUG) to selfuncs.h, and a corresponding
ifdef/elog NOTICE to clause_selectivity(). This was to help me debug/verify
the calculations. Should this be left in the code when I create a patch (it
is in this one), and if so, is there a preferred "standard" approach to this
type of debug code?
- Using the debug code mentioned above, I noted that clause_selectivity()
did not seem to get called at all for clauses like "where myfield = 0" or
"where myfield > 0". I haven't looked too closely at it yet, but I was
wondering if this is expected behavior?

Thanks,

-- Joe

Attachments:

newnodesel_01.diffapplication/octet-stream; name=newnodesel_01.diffDownload
diff -Naur pgsql.virg/src/backend/optimizer/path/clausesel.c pgsql.dev/src/backend/optimizer/path/clausesel.c
--- pgsql.virg/src/backend/optimizer/path/clausesel.c	Thu Jun 21 04:18:11 2001
+++ pgsql.dev/src/backend/optimizer/path/clausesel.c	Sun Jun 24 06:05:57 2001
@@ -24,7 +24,7 @@
 #include "parser/parsetree.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
-
+#include "utils/selfuncs.h"
 
 /* note that pg_type.h hardwires size of bool as 1 ... duplicate it */
 #define MAKEBOOLCONST(val,isnull) \
@@ -409,6 +409,20 @@
 			}
 		}
 	}
+	else if (IsA(clause, NullTest))
+	{
+		/* 
+		 * Use node specific selectivity calculation function
+		 */
+		s1 = nulltestsel(root, (NullTest *) clause, varRelid);
+	}
+	else if (IsA(clause, BooleanTest))
+	{
+		/* 
+		 * Use node specific selectivity calculation function
+		 */
+		s1 = booltestsel(root, (BooleanTest *) clause, varRelid);
+	}
 	else if (IsA(clause, Param))
 	{
 		/* XXX any way to do better? */
@@ -516,6 +530,10 @@
 								((RelabelType *) clause)->arg,
 								varRelid);
 	}
+
+	#ifdef BOOLTESTDEBUG
+	elog(NOTICE, "clause_selectivity: s1 %f", s1);
+	#endif	 /* BOOLTESTDEBUG */
 
 	return s1;
 }
diff -Naur pgsql.virg/src/backend/utils/adt/selfuncs.c pgsql.dev/src/backend/utils/adt/selfuncs.c
--- pgsql.virg/src/backend/utils/adt/selfuncs.c	Thu Jun 21 04:18:12 2001
+++ pgsql.dev/src/backend/utils/adt/selfuncs.c	Sun Jun 24 06:04:30 2001
@@ -94,6 +94,7 @@
 #include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/selfuncs.h"
 
 /*
  * Note: the default selectivity estimates are not chosen entirely at random.
@@ -117,6 +118,10 @@
 /* default number of distinct values in a table */
 #define DEFAULT_NUM_DISTINCT  200
 
+/* default selectivity estimate for boolean and null test nodes */
+#define DEFAULT_UNK_SEL			0.005
+#define DEFAULT_NOT_UNK_SEL		1 - DEFAULT_UNK_SEL
+#define DEFAULT_BOOL_SEL		0.5
 
 static bool convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue,
 				  Datum lobound, Datum hibound, Oid boundstypid,
@@ -931,6 +936,359 @@
 	result = patternsel(fcinfo, Pattern_Type_Like_IC);
 	result = 1.0 - result;
 	PG_RETURN_FLOAT8(result);
+}
+
+/*
+ *		booltestsel		- Selectivity of BooleanTest Node.
+ */
+Selectivity
+booltestsel(Query *root, BooleanTest *clause, int varRelid)
+{
+	Var			   *var;
+	Node		   *arg;
+	Oid				relid;
+	HeapTuple		statsTuple;
+	Datum		   *values;
+	int				nvalues;
+	float4		   *numbers;
+	int				nnumbers;
+	double			selec;
+	double			defselec;
+	double			freq_true;
+	double			freq_false;
+	double			freq_null;
+
+	if (!IsA(clause, BooleanTest))
+		elog(ERROR, "booltestsel: unexpected node type %d",
+			(int) clause->type);
+
+	switch (clause->booltesttype)
+    {
+        case IS_UNKNOWN:
+			defselec = DEFAULT_UNK_SEL;
+			break;
+        case IS_NOT_UNKNOWN:
+			defselec = DEFAULT_NOT_UNK_SEL;
+			break;
+        case IS_TRUE:
+        case IS_NOT_FALSE:
+        case IS_FALSE:
+        case IS_NOT_TRUE:
+			defselec = DEFAULT_BOOL_SEL;
+			break;
+        default:
+            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+                 (int) clause->booltesttype);
+            return (Selectivity) 0;  /* keep compiler quiet */
+	}
+
+	arg = (Node *) clause->arg;
+
+	/*
+	 * Ignore any binary-compatible relabeling
+	 */
+	if (IsA(arg, RelabelType))
+		arg = ((RelabelType *) arg)->arg;
+
+	if (IsA(arg, Var) && (varRelid == 0 || varRelid == ((Var *) arg)->varno))
+		var = (Var *) arg;
+	else
+	{
+		/*
+		 * punt
+		 */
+		switch (clause->booltesttype)
+	    {
+	        case IS_UNKNOWN:
+	        case IS_NOT_UNKNOWN:
+				selec = defselec;
+				break;
+	        case IS_TRUE:
+	        case IS_NOT_FALSE:
+				selec = (double) clause_selectivity(root, (Node *) arg, varRelid);
+				break;
+	        case IS_FALSE:
+	        case IS_NOT_TRUE:
+				selec = 1.0 - (double) clause_selectivity(root, (Node *) arg, varRelid);
+				break;
+	        default:
+	            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+	                 (int) clause->booltesttype);
+				selec = 0.0;	/* Keep compiler quiet */
+		}
+		return (Selectivity) selec;
+	}
+
+	relid = getrelid(var->varno, root->rtable);
+	if (relid == InvalidOid)
+			return (Selectivity) defselec;
+
+	/* get stats for the attribute, if available */
+	statsTuple = SearchSysCache(STATRELATT,
+								ObjectIdGetDatum(relid),
+								Int16GetDatum(var->varattno),
+								0, 0);
+	if (HeapTupleIsValid(statsTuple))
+	{
+		Form_pg_statistic stats;
+
+		stats = (Form_pg_statistic) GETSTRUCT(statsTuple);
+		freq_null = stats->stanullfrac;
+
+		/*
+		 * Var is being compared to a known non-null constant,
+		 * i.e. true, false, or unknown
+		 */
+		if (get_attstatsslot(statsTuple, var->vartype, var->vartypmod,
+							 STATISTIC_KIND_MCV, InvalidOid,
+							 &values, &nvalues,
+							 &numbers, &nnumbers))
+		{
+			/*
+			 * Get first MCV frequency and derive frequency for true.
+			 */
+			if (DatumGetBool(values[0]))
+				freq_true = numbers[0];
+			else
+				freq_true = 1.0 - numbers[0] - stats->stanullfrac;
+
+			/*
+			 * Next derive freqency for false and set frequency for null.
+			 * Then use these as appropriate to derive frequency for each case.
+			 */
+			freq_false = 1.0 - freq_true - stats->stanullfrac;
+
+			switch (clause->booltesttype)
+		    {
+		        case IS_UNKNOWN:
+					/*
+					 * Use freq_null directly.
+					 */
+					selec = freq_null;
+					break;
+		        case IS_NOT_UNKNOWN:
+					/*
+					 * Select not unknown (not null) values.
+					 * Calculate from freq_null.
+					 */
+					selec = 1.0 - freq_null;
+					break;
+		        case IS_TRUE:
+					/*
+					 * Use freq_true directly.
+					 */
+					selec = freq_true;
+					break;
+		        case IS_NOT_FALSE:
+					/*
+					 * Select not false values.
+					 * Calculate from freq_false.
+					 */
+					selec = 1 - freq_false;
+					break;
+		        case IS_FALSE:
+					/*
+					 * Use freq_false directly.
+					 */
+					selec = freq_false;
+					break;
+		        case IS_NOT_TRUE:
+					/*
+					 * Select not true values.
+					 * Calculate from freq_true.
+					 */
+					selec = 1 - freq_true;
+					break;
+		        default:
+		            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+		                 (int) clause->booltesttype);
+					/*
+					 * Keep compiler quiet.
+					 */
+					selec = 0.0;
+	            	return (Selectivity) selec;
+			}
+		}
+		else
+		{
+			/*
+			 * Prevent compiler complaints.
+			 */
+			values = NULL;
+			numbers = NULL;
+			nvalues = nnumbers = 0;
+
+			/*
+			 * No most-common-value info available.
+			 * Still have null fraction information,
+			 * so use it for is [not] unknown.
+			 * Otherwise adjust for null fraction and
+			 * assume an even split for boolean tests.
+			 */
+
+			switch (clause->booltesttype)
+		    {
+		        case IS_UNKNOWN:
+					/*
+					 * Use freq_null directly.
+					 */
+					selec = freq_null;
+					break;
+		        case IS_NOT_UNKNOWN:
+					/*
+					 * Select not unknown (not null) values.
+					 * Calculate from freq_null.
+					 */
+					selec = 1.0 - freq_null;
+					break;
+		        case IS_TRUE:
+		        case IS_NOT_FALSE:
+		        case IS_FALSE:
+		        case IS_NOT_TRUE:
+					selec = (1.0 - freq_null) / 2;
+					break;
+		        default:
+		            elog(ERROR, "booltestsel: unexpected booltesttype %d",
+		                 (int) clause->booltesttype);
+					/*
+					 * keep compiler quiet
+					 */
+					selec = 0.0;
+	            	return (Selectivity) selec;
+			}
+		}
+
+		ReleaseSysCache(statsTuple);
+	}
+	else
+	{
+		/*
+		 * No VACUUM ANALYZE stats available, so make a guess using
+		 * the number of distinct values (2) and assuming they are
+		 * equally common.  (The guess is unlikely to be very good,
+		 * but we do know a few special cases.)
+		 */
+		selec = defselec;
+	}
+
+	/* result should be in range, but make sure... */
+	if (selec < 0.0)
+		selec = 0.0;
+	else if (selec > 1.0)
+		selec = 1.0;
+
+	return (Selectivity) selec;
+}
+
+/*
+ *		nulltestsel		- Selectivity of NullTest Node.
+ */
+Selectivity
+nulltestsel(Query *root, NullTest *clause, int varRelid)
+{
+	Var			   *var;
+	Node		   *arg;
+	Oid				relid;
+	HeapTuple		statsTuple;
+	double			selec;
+	double			defselec;
+	double			freq_null;
+
+	if (!IsA(clause, NullTest))
+		elog(ERROR, "nulltestsel: unexpected node type %d",
+			(int) clause->type);
+
+	switch (clause->nulltesttype)
+    {
+        case IS_NULL:
+			defselec = DEFAULT_UNK_SEL;
+			break;
+        case IS_NOT_NULL:
+			defselec = DEFAULT_NOT_UNK_SEL;
+			break;
+        default:
+            elog(ERROR, "nulltestsel: unexpected nulltesttype %d",
+                 (int) clause->nulltesttype);
+            return (Selectivity) 0;  /* keep compiler quiet */
+	}
+
+	arg = (Node *) clause->arg;
+
+	/*
+	 * Ignore any binary-compatible relabeling
+	 */
+	if (IsA(arg, RelabelType))
+		arg = ((RelabelType *) arg)->arg;
+
+	if (IsA(arg, Var) && (varRelid == 0 || varRelid == ((Var *) arg)->varno))
+		var = (Var *) arg;
+	else
+	{
+		/*
+		 * punt
+		 */
+		selec = defselec;
+		return (Selectivity) selec;
+	}
+
+	relid = getrelid(var->varno, root->rtable);
+	if (relid == InvalidOid)
+			return (Selectivity) defselec;
+
+	/* get stats for the attribute, if available */
+	statsTuple = SearchSysCache(STATRELATT,
+								ObjectIdGetDatum(relid),
+								Int16GetDatum(var->varattno),
+								0, 0);
+	if (HeapTupleIsValid(statsTuple))
+	{
+		Form_pg_statistic stats;
+
+		stats = (Form_pg_statistic) GETSTRUCT(statsTuple);
+		freq_null = stats->stanullfrac;
+
+		switch (clause->nulltesttype)
+	    {
+	        case IS_NULL:
+				/*
+				 * Use freq_null directly.
+				 */
+				selec = freq_null;
+				break;
+	        case IS_NOT_NULL:
+				/*
+				 * Select not unknown (not null) values.
+				 * Calculate from freq_null.
+				 */
+				selec = 1.0 - freq_null;
+				break;
+	        default:
+	            elog(ERROR, "nulltestsel: unexpected nulltesttype %d",
+	                 (int) clause->nulltesttype);
+				/*
+				 * Keep compiler quiet.
+				 */
+				selec = 0.0;
+            	return (Selectivity) selec;
+		}
+
+		ReleaseSysCache(statsTuple);
+	}
+	else
+	{
+		/*
+		 * No VACUUM ANALYZE stats available, so make a guess
+		 */
+		selec = defselec;
+	}
+
+	/* result should be in range, but make sure... */
+	if (selec < 0.0)
+		selec = 0.0;
+	else if (selec > 1.0)
+		selec = 1.0;
+
+	return (Selectivity) selec;
 }
 
 /*
diff -Naur pgsql.virg/src/include/utils/selfuncs.h pgsql.dev/src/include/utils/selfuncs.h
--- pgsql.virg/src/include/utils/selfuncs.h	Thu Jan  1 00:00:00 1970
+++ pgsql.dev/src/include/utils/selfuncs.h	Sun Jun 24 06:05:48 2001
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * selfuncs.h
+ *	  Selectivity functions and index cost estimation functions for
+ *	  standard operators and index access methods.
+ *
+ * Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SELFUNCS_H
+#define SELFUNCS_H
+
+
+#ifdef DEBUGBOOLTEST
+#define BOOLTESTDEBUG
+#endif	 /* DEBUGBOOLTEST */
+
+Selectivity booltestsel(Query *root, BooleanTest *clause, int varRelid);
+Selectivity nulltestsel(Query *root, NullTest *clause, int varRelid);
+
+
+#endif	 /* SELFUNCS_H */