BUG #2334: WHERE IN (SUBSELECT) fails when column is null

Started by Patrick Narkinskyabout 20 years ago4 messagesbugs

patrick@narkinsky.com

about 20 years ago

The following bug has been logged online:

Bug reference: 2334
Logged by: Patrick Narkinsky
Email address: patrick@narkinsky.com
PostgreSQL version: 8.1.3
Operating system: Mac OS X
Description: WHERE IN (SUBSELECT) fails when column is null
Details:

This may be expected behavior, but it certainly doesn't seem right to me,
and it works as expected in sqlite.

The database is as follows:

BEGIN TRANSACTION;
create table a (
id integer,
text varchar(20)
);
INSERT INTO a VALUES(0,'test');
INSERT INTO a VALUES(1,'test2');
create table b (
id integer,
a_id integer);
INSERT INTO b VALUES(0,NULL);
INSERT INTO b VALUES(1,NULL);
INSERT INTO b VALUES(2,NULL);
COMMIT;

The following query returns everything in a in sqlite, but returns nothing
in postgresql:

select * from a where a.id not in (select a_id from b);

On postgresql, it works as expected when a_id has a non-null value. I'm not
expert enough on SQL to say which is wrong, but it appears to me that the
SQLite behavior makes a lot more sense.

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 20 years ago

In reply to: Patrick Narkinsky (#1)

Re: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

On Fri, 17 Mar 2006, Patrick Narkinsky wrote:

This may be expected behavior, but it certainly doesn't seem right to me,
and it works as expected in sqlite.

The database is as follows:

BEGIN TRANSACTION;
create table a (
id integer,
text varchar(20)
);
INSERT INTO a VALUES(0,'test');
INSERT INTO a VALUES(1,'test2');
create table b (
id integer,
a_id integer);
INSERT INTO b VALUES(0,NULL);
INSERT INTO b VALUES(1,NULL);
INSERT INTO b VALUES(2,NULL);
COMMIT;

The following query returns everything in a in sqlite, but returns nothing
in postgresql:

select * from a where a.id not in (select a_id from b);

AFAICS, our behavior follows SQL.

a NOT IN b is NOT(a IN b)
IN is defined in terms of = ANY.
a =ANY (b) is basically (by my reading of 8.8 anyway):
True if a = bi for some bi in b
False if b is empty or a <> bi for all bi in b
Unknown otherwise
Since a <> NULL returns unknown, the second one won't come up, so the
whole expression won't ever be true after the negation. It might be false
or it might be unknown.

Marinos Yannikos

mjy@geizhals.at

about 20 years ago

In reply to: Stephan Szabo (#2)

Re: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

Stephan Szabo schrieb:

AFAICS, our behavior follows SQL.

a NOT IN b is NOT(a IN b)
IN is defined in terms of = ANY.
a =ANY (b) is basically (by my reading of 8.8 anyway):
True if a = bi for some bi in b
False if b is empty or a <> bi for all bi in b
Unknown otherwise
Since a <> NULL returns unknown, the second one won't come up, so the
whole expression won't ever be true after the negation. It might be false
or it might be unknown.

Not having read 8.8, I encountered this today and found it odd as well.
It would mean that the old popular optimization, back when "A IN B" was
much slower, was not correct:

select * from foo where a not in (select b from bar)

used to be written as:

select * from foo where not exists (select 1 from bar where a=b)

These queries have different results now when b is NULL for some rows.
It doesn't look right to me (but if the Standard requires it, what can
we do...).

Regards,
Marinos

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 20 years ago

In reply to: Marinos Yannikos (#3)

Re: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

On Wed, 22 Mar 2006, Marinos Yannikos wrote:

Stephan Szabo schrieb:

AFAICS, our behavior follows SQL.

a NOT IN b is NOT(a IN b)
IN is defined in terms of = ANY.
a =ANY (b) is basically (by my reading of 8.8 anyway):
True if a = bi for some bi in b
False if b is empty or a <> bi for all bi in b
Unknown otherwise
Since a <> NULL returns unknown, the second one won't come up, so the
whole expression won't ever be true after the negation. It might be false
or it might be unknown.

Not having read 8.8, I encountered this today and found it odd as well.
It would mean that the old popular optimization, back when "A IN B" was
much slower, was not correct:

select * from foo where a not in (select b from bar)

used to be written as:

select * from foo where not exists (select 1 from bar where a=b)

Yep, in->exists I believe is the same, but not in->not exists is
different. Exists and subqueries should probably have been done
differently in SQL, but alas.

These queries have different results now when b is NULL for some rows.
It doesn't look right to me (but if the Standard requires it, what can
we do...).

It actually makes some sense if you think about null as an unknown value.

If you ask is 1 in the set (1, 2, unknown), you can definately say yes.
If you ask is 3 in the set (1, 2, unknown), you can't be sure, because
that unknown might be 3.
For any x that's of the correct type for the set, you'll never be able to
say no due to that unknown.

If you ask is 1 not in the set (1, 2, unknown) you can definately say no.
If you ask is 3 not in the set (1, 2, unknown) you again can't be sure.
For any x that's of the correct type for the set, you'll never be able to
say yes due to that unknown.