Question on not-in and array-eq

Started by Zhenghua Lyuabout 4 years ago3 messages
#1Zhenghua Lyu
zlyu@vmware.com

Hi,

I run the following SQL in Postgres (14_STABLE), and got the results:
zlyu=# create table t1(a int, b int);
CREATE TABLE
zlyu=# create table t2(a int, b int);
CREATE TABLE
zlyu=# insert into t1 values (null, 1);
INSERT 0 1
zlyu=# insert into t2 values (1, 1);
INSERT 0 1
zlyu=# select * from t1 where (a, b) not in (select * from t2);
a | b
---+---
(0 rows)

zlyu=# select * from t1 where (a, b) in (select * from t2);
a | b
---+---
(0 rows)

zlyu=# select * from t1 where array[a, b] in (select array[a,b] from t2);
a | b
---+---
(0 rows)

zlyu=# select * from t1 where array[a, b] not in (select array[a,b] from t2);
a | b
---+---
| 1
(1 row)

I run the SQL without array expr​ in other DBs(orcale, sqlite, ...), they all behave
the same as Postgres.

It seems a bit confusing for me that 'not in' and 'in' the same subquery both return 0
rows, but the table contains data.

Also, manually using array expression behaves differently from the first SQL. For not in case,
I step in the code, and find array_eq will consider null = null as true, however ExecSubPlan will
consider null as unprovable and exclude that row.

How to understand the result? It seems SQL standard does not mention array operation for null
value.

Thanks!

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Zhenghua Lyu (#1)
Re: Question on not-in and array-eq

On Wed, Dec 8, 2021 at 8:15 AM Zhenghua Lyu <zlyu@vmware.com> wrote:

I run the SQL without array expr in other DBs(orcale, sqlite, ...), they
all behave
the same as Postgres.

It seems a bit confusing for me that 'not in' and 'in' the same subquery
both return 0
rows, but the table contains data.

Because of this dynamic the reliable negation of "in" is "not (... in ...)"
as opposed to "not in".

https://www.postgresql.org/docs/current/functions-comparison.html

"If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while IS
NOT NULL is true when the row expression itself is non-null and all the
row's fields are non-null. Because of this behavior, IS NULL and IS NOT
NULL do not always return inverse results for row-valued expressions; in
particular, a row-valued expression that contains both null and non-null
fields will return false for both tests."

The implications of the IS NULL treatment extends to equality checks and
thus the "[NOT] IN" expression.

Also, manually using array expression behaves differently from the first

SQL. For not in case,
I step in the code, and find array_eq will consider null = null as true,
however ExecSubPlan will
consider null as unprovable and exclude that row.

How to understand the result? It seems SQL standard does not mention array
operation for null
value.

When comparing two non-null array variables the result will be either true
or false. If either of the array variables, as a whole, is null the result
will be null. This is due to the general rule that operations on null
values result in null. And the general desire to make array comparisons
behave in the manner expected by users as opposed to the surprising result
that row-valued values provide. The two simply are defined to behave
differently - mainly due to the fact that for row-valued data we choose to
adhere to the SQL Standard.

David J.

#3Zhenghua Lyu
zlyu@vmware.com
In reply to: David G. Johnston (#2)
Re: Question on not-in and array-eq

Thanks for your explanation.
________________________________
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Wednesday, December 8, 2021 11:39 PM
To: Zhenghua Lyu <zlyu@vmware.com>
Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: Question on not-in and array-eq

On Wed, Dec 8, 2021 at 8:15 AM Zhenghua Lyu <zlyu@vmware.com<mailto:zlyu@vmware.com>> wrote:
I run the SQL without array expr in other DBs(orcale, sqlite, ...), they all behave
the same as Postgres.

It seems a bit confusing for me that 'not in' and 'in' the same subquery both return 0
rows, but the table contains data.

Because of this dynamic the reliable negation of "in" is "not (... in ...)" as opposed to "not in".

https://www.postgresql.org/docs/current/functions-comparison.html&lt;https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2Fcurrent%2Ffunctions-comparison.html&amp;data=04%7C01%7Czlyu%40vmware.com%7C2213729fc82044f4f8f808d9ba60fa8d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637745747952431673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=%2F6eeHObPA3NZ5zDjQlTISgnjXrinLQojBtdtpFzJ018%3D&amp;reserved=0&gt;

"If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests."

The implications of the IS NULL treatment extends to equality checks and thus the "[NOT] IN" expression.

Also, manually using array expression behaves differently from the first SQL. For not in case,
I step in the code, and find array_eq will consider null = null as true, however ExecSubPlan will
consider null as unprovable and exclude that row.

How to understand the result? It seems SQL standard does not mention array operation for null
value.

When comparing two non-null array variables the result will be either true or false. If either of the array variables, as a whole, is null the result will be null. This is due to the general rule that operations on null values result in null. And the general desire to make array comparisons behave in the manner expected by users as opposed to the surprising result that row-valued values provide. The two simply are defined to behave differently - mainly due to the fact that for row-valued data we choose to adhere to the SQL Standard.

David J.