BUG #14494: Regression - Null arrays are not queryable

Started by Linas Vepstasover 9 years ago6 messagesbugs

linasvepstas@gmail.com

over 9 years ago

The following bug has been logged on the website:

Bug reference: 14494
Logged by: Linas Vepstas
Email address: linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system: Debian unstable
Description:

Array handling appears to have regressed,
with lots of surprising results for empty
and null arrays.

CREATE TABLE foo (
name TEXT,
outgoing BIGINT[]
);

INSERT INTO FOO VALUES ('one', '{43, 67}');
INSERT INTO FOO VALUES ('two', '{}');
INSERT INTO FOO VALUES ('three', null);

test=> select * from foo;
name | outgoing
-------+----------
one | {43,67}
two | {}
three |
(3 rows)

very good!

test=> select * from foo where outgoing='{}';
name | outgoing
------+----------
two | {}
(1 row)

very good!

test=> select * from foo where outgoing=null;
name | outgoing
------+----------
(0 rows)

unexpected: wanted one row here.

test=> select * from foo where outgoing != '{43,67}';
name | outgoing
------+----------
two | {}
(1 row)

unexpected: wanted two rows here. Where did the null row go?
test=> select * from foo where array_length(outgoing,1)=2 ;
name | outgoing
------+----------
one | {43,67}
(1 row)

very good

test=> select * from foo where array_length(outgoing,1)=0 ;
name | outgoing
------+----------
(0 rows)

Huh? there is at least one array whose length is zero, and another that
doesn't have a length...

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Linas Vepstas (#1)

Re: BUG #14494: Regression - Null arrays are not queryable

Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 14494
Logged by: Linas Vepstas
Email address: linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system: Debian unstable
Description:

Array handling appears to have regressed,

Regressed from what? Because these don't look like they'd have changed
recently.

test=> select * from foo where outgoing=null;
name | outgoing
------+----------
(0 rows)

unexpected: wanted one row here.

Comparisons with NULL result in NULL, which evaluates to false in a
WHERE clause. Try WHERE outgoing IS NULL;

test=> select * from foo where outgoing != '{43,67}';
name | outgoing
------+----------
two | {}
(1 row)

unexpected: wanted two rows here. Where did the null row go?

Same answer as above. WHERE outgoing IS DISTINCT FROM ...;

test=> select * from foo where array_length(outgoing,1)=0 ;
name | outgoing
------+----------
(0 rows)

Huh? there is at least one array whose length is zero,

An empty array doesn't have a dimension 1 and thus results in NULL. Yes,
that's a bit confusing.

and another that doesn't have a length...

You mean the NULL one? Why would you expect a 0 as a result then?

Greetings,

Andres Freund

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Linas Vepstas (#1)

Re: BUG #14494: Regression - Null arrays are not queryable

On Fri, Jan 13, 2017 at 11:38 AM, <linasvepstas@gmail.com> wrote:

The following bug has been logged on the website:

Bug reference: 14494
Logged by: Linas Vepstas
Email address: linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system: Debian unstable
Description:

Array handling appears to have regressed,

From what?

with lots of surprising results for empty
and null arrays.

test=> select * from foo where outgoing=null;
name | outgoing
------+----------
(0 rows)

unexpected: wanted one row here.

While you can configure the system to yield "true" for "null = null" by
default it does not - I suspect your prior system has this configuration
option turned on. This is not specific to arrays.

https://www.postgresql.org/docs/9.6/static/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-CLIENTS

test=> select * from foo where outgoing != '{43,67}';

name | outgoing
------+----------
two | {}
(1 row)

unexpected: wanted two rows here. Where did the null row go?

Again, null != (non-null) yields null, not true, and would work this way
regardless of the option mentioned above...

test=> select * from foo where array_length(outgoing,1)=0 ;
name | outgoing
------+----------
(0 rows)

Huh? there is at least one array whose length is zero, and another that
doesn't have a length...

Empty arrays and nulls, when provided to array_length, report NULL for all
dimensions.

David J.

Linas Vepstas

linasvepstas@gmail.com

over 9 years ago

In reply to: Andres Freund (#2)

Re: BUG #14494: Regression - Null arrays are not queryable

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 14494
Logged by: Linas Vepstas
Email address: linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system: Debian unstable
Description:

Array handling appears to have regressed,

Regressed from what? Because these don't look like they'd have changed
recently.

I'd have to do forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in
C/C++, or in python or in perl, etc): they have a length, and in C/C++,
they are null-terminated. There is a concept of a null string: a string of
length zero: its a valid string, that you can treat like any other. Its not
exceptional, or different in any way. Standard string ops work on them:
they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but
is still an array, and can be handled using the same kind of code that
non-zero length arrays use. One does not have to write special-case code
for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard.
Special-case code needs to be written to handle them, since they are
exceptional. This doubles the amount of unit-testing required, and doubles
the attack surface for bugs. It is an unintuitive and surprising situation,
for those of us, like me, who have data that contains arrays, some of which
are zero-legnth arrays.

I respect that you might want to design postgres the way way it is
currently, but it is counter-intuitive and surprising, and opens the door
to surprising behaviors and bugs in user-land. It doesn't "smell" like a
rational design choice; instead, it smells like something arbitrary or
accidental.

--linas

Show quoted text

test=> select * from foo where outgoing=null;
name | outgoing
------+----------
(0 rows)

unexpected: wanted one row here.

Comparisons with NULL result in NULL, which evaluates to false in a
WHERE clause. Try WHERE outgoing IS NULL;

test=> select * from foo where outgoing != '{43,67}';
name | outgoing
------+----------
two | {}
(1 row)

unexpected: wanted two rows here. Where did the null row go?

Same answer as above. WHERE outgoing IS DISTINCT FROM ...;

test=> select * from foo where array_length(outgoing,1)=0 ;
name | outgoing
------+----------
(0 rows)

Huh? there is at least one array whose length is zero,

An empty array doesn't have a dimension 1 and thus results in NULL. Yes,
that's a bit confusing.

and another that doesn't have a length...

You mean the NULL one? Why would you expect a 0 as a result then?

Greetings,

Andres Freund

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Linas Vepstas (#4)

Re: BUG #14494: Regression - Null arrays are not queryable

On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas@gmail.com>
wrote:

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 14494
Logged by: Linas Vepstas
Email address: linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system: Debian unstable
Description:

Array handling appears to have regressed,

Regressed from what? Because these don't look like they'd have changed
recently.

I'd have to do

forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings
(in C/C++, or in python or in perl, etc): they have a length, and in C/C++,
they are null-terminated. There is a concept of a null string: a string of
length zero: its a valid string, that you can treat like any other. Its not
exceptional, or different in any way. Standard string ops work on them:
they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero,
but is still an array, and can be handled using the same kind of code that
non-zero length arrays use. One does not have to write special-case code
for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard.
Special-case code needs to be written to handle them, since they are
exceptional. This doubles the amount of unit-testing required, and doubles
the attack surface for bugs. It is an unintuitive and surprising situation,
for those of us, like me, who have data that contains arrays, some of which
are zero-legnth arrays.

I respect that you might want to design postgres the way way it is
currently, but it is counter-intuitive and surprising, and opens the door
to surprising behaviors and bugs in user-land. It doesn't "smell" like a
rational design choice; instead, it smells like something arbitrary or
accidental.

--linas

It is a shared belief (in differing strengths) that our array behavior is
less than ideal - but it is functional and the opinions of people already
using our software and have working applications based around the present
behavior are afforded considerable weight when deciding whether to fix
something not really broken but also not ideal. Usually the outcome is
that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in
PostgreSQL. Arrays are multi-dimensional. So at least at a distance you
are constructing a flawed mental model of an array based solely upon
experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present
information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing
you might find people willing to point out potentially better ways to do
the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent
handling as possible. That covers your other items. "NULL" is not the
same as <empty> no matter what the underlying data type is (arrays,
strings, etc...). This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data. You
already seem comfortable with the concept of "empty" and in many situations
an empty something is exactly what is meant to be conveyed by the use of
NULL.

If you really want to debate/explore the topic the -general list is the
proper choice. This list should be used for true bug reporting which you
have not yet done. Unintended and undocumented regressions are bugs but so
far that hasn't been shown to be the case. The rest of the exposition is
discussion, not bug reporting or exploring.

David J.

Linas Vepstas

linasvepstas@gmail.com

over 9 years ago

In reply to: David G. Johnston (#5)

Re: BUG #14494: Regression - Null arrays are not queryable

OK. One of these is a change in behavior from earlier versions; I won't be
able to double-check to provide details for at least a few weeks. (or have
any time for other discussion)

In my case, I am layering a graph DB (a hypergraph DB) on top of SQL. The
array is a list of primary keys. Think of a "multi-tree" or a "forest": a
bunch of trees (DAG's) but these may share nodes. Thus all rows are nodes,
the array is all the children of the node, and leaf-nodes have no children:
either an empty array or a null array or a zero-length array, I'm not picky
on the representation.

--linas

On Fri, Feb 17, 2017 at 7:08 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas@gmail.com>
wrote:

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de>
wrote:

Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 14494
Logged by: Linas Vepstas
Email address: linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system: Debian unstable
Description:

Array handling appears to have regressed,

Regressed from what? Because these don't look like they'd have changed
recently.

I'd have to do

forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings
(in C/C++, or in python or in perl, etc): they have a length, and in C/C++,
they are null-terminated. There is a concept of a null string: a string of
length zero: its a valid string, that you can treat like any other. Its not
exceptional, or different in any way. Standard string ops work on them:
they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero,
but is still an array, and can be handled using the same kind of code that
non-zero length arrays use. One does not have to write special-case code
for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard.
Special-case code needs to be written to handle them, since they are
exceptional. This doubles the amount of unit-testing required, and doubles
the attack surface for bugs. It is an unintuitive and surprising situation,
for those of us, like me, who have data that contains arrays, some of which
are zero-legnth arrays.

I respect that you might want to design postgres the way way it is
currently, but it is counter-intuitive and surprising, and opens the door
to surprising behaviors and bugs in user-land. It doesn't "smell" like a
rational design choice; instead, it smells like something arbitrary or
accidental.

--linas

It is a shared belief (in differing strengths) that our array behavior is
less than ideal - but it is functional and the opinions of people already
using our software and have working applications based around the present
behavior are afforded considerable weight when deciding whether to fix
something not really broken but also not ideal. Usually the outcome is
that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in
PostgreSQL. Arrays are multi-dimensional. So at least at a distance you
are constructing a flawed mental model of an array based solely upon
experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present
information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing
you might find people willing to point out potentially better ways to do
the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent
handling as possible. That covers your other items. "NULL" is not the
same as <empty> no matter what the underlying data type is (arrays,
strings, etc...). This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data. You
already seem comfortable with the concept of "empty" and in many situations
an empty something is exactly what is meant to be conveyed by the use of
NULL.

If you really want to debate/explore the topic the -general list is the
proper choice. This list should be used for true bug reporting which you
have not yet done. Unintended and undocumented regressions are bugs but so
far that hasn't been shown to be the case. The rest of the exposition is
discussion, not bug reporting or exploring.

David J.