Inconsistent behavior on Array & Is Null?

Started by Josh Berkusabout 22 years ago24 messageshackers
Jump to latest
#1Josh Berkus
josh@agliodbs.com

Joe,

First off, pardon me if these are known things which are already fixed in CVS.
Also, let me again thank you for all the work on Arrays in 7.4; I've been
able to tremendously simplify quite a number of procedures in my databases
thanks to the new array support.

Now, the issue:
I'm noticing some inconsistent behavior regarding empty arrays and IS NULL
status. For example:

net_test=# select array_upper('{}'::INT[], 1);
array_upper
-------------

(1 row)

net_test=# select array_upper('{}'::INT[], 1) IS NULL;
?column?
----------
t
(1 row)

net_test=# select '{}'::INT[] IS NULL;
?column?
----------
f
(1 row)

I feel that this is confusing; an empty array should be considered NULL
everywhere or nowhere. For that matter, the new array declaration syntax
does not support empty arrays:

net_test=# select ARRAY[ ]::INT[];
ERROR: syntax error at or near "]" at character 15
net_test=# select ARRAY[]::INT[];
ERROR: syntax error at or near "]" at character 14
net_test=#

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#2Joe Conway
mail@joeconway.com
In reply to: Josh Berkus (#1)
Re: Inconsistent behavior on Array & Is Null?

Josh Berkus wrote:

I'm noticing some inconsistent behavior regarding empty arrays and IS NULL
status. For example:

net_test=# select array_upper('{}'::INT[], 1) IS NULL;
?column?
----------
t
(1 row)

This is correct. There are no dimensions to an empty array by
definition. The only other way to handle this would be an ERROR. I
followed the lead of (the pre-existing function) array_dims() when
creating array_upper() and array_lower().

net_test=# select '{}'::INT[] IS NULL;
?column?
----------
f
(1 row)

This is also correct, and completely orthogonal to the first example.
There is a difference between an empty array and NULL, just like there
is between an empty string and NULL.

I feel that this is confusing; an empty array should be considered NULL
everywhere or nowhere.

As I said above, that makes no more sense than saying '' == NULL

For that matter, the new array declaration syntax does not support
empty arrays:
net_test=# select ARRAY[ ]::INT[];
ERROR: syntax error at or near "]" at character 15

This is a known issue, and will not be easily fixed. We discussed it at
some length last June/July. See especially:

http://archives.postgresql.org/pgsql-hackers/2003-06/msg01174.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01195.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01196.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01298.php

Joe

#3Josh Berkus
josh@agliodbs.com
In reply to: Joe Conway (#2)
Re: Inconsistent behavior on Array & Is Null?

Joe,

This is correct. There are no dimensions to an empty array by
definition. The only other way to handle this would be an ERROR. I
followed the lead of (the pre-existing function) array_dims() when
creating array_upper() and array_lower().

What about a 0? That seems more consistent to me. If the array is empty,
its dimensions are not "NULL", meaning "unknown", but in fact zero elements,
which is a known value. The way it works now, array_upper on a NULL array
produces the same results as array_upper on an empty-but-non-null array.

Or is there some concept I'm missing?

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#4Joe Conway
mail@joeconway.com
In reply to: Josh Berkus (#3)
Re: Inconsistent behavior on Array & Is Null?

Josh Berkus wrote:

What about a 0? That seems more consistent to me. If the array
is empty, its dimensions are not "NULL", meaning "unknown", but in
fact zero elements, which is a known value.

They cannot be 0 because 0 is a real index. They are undefined, because
an empty array has no dimensions. It is entirely possible to have a real
array that starts at a lower bound of 0 (or even an upper bound of 0).

regression=# select f[0] from (select 99 || array[1,2,3] as f) as t;
f
----
99
(1 row)

regression=# create table a1 (f int[]);
CREATE TABLE
regression=# insert into a1 values('{}');
INSERT 18688045 1
regression=# update a1 set f[0] = 99;
UPDATE 1
regression=# select array_upper(f,1) from a1;
array_upper
-------------
0
(1 row)

The way it works now, array_upper on a NULL array produces the same
results as array_upper on an empty-but-non-null array.

Sure, and in both cases array_upper is undefined because there are no
array dimensions to speak of. I guess you might argue that array_upper,
array_lower, and array_dims should all produce an ERROR on null input
instead of NULL. But that would have been an un-backward compatible
change for array_dims at the time array_lower and array_upper were
created. I don't really believe they should throw an ERROR on an empty
array though.

Joe

#5Josh Berkus
josh@agliodbs.com
In reply to: Joe Conway (#4)
Re: Inconsistent behavior on Array & Is Null?

Joe,

Sure, and in both cases array_upper is undefined because there are no
array dimensions to speak of. I guess you might argue that array_upper,
array_lower, and array_dims should all produce an ERROR on null input
instead of NULL. But that would have been an un-backward compatible
change for array_dims at the time array_lower and array_upper were
created. I don't really believe they should throw an ERROR on an empty
array though.

OK, I understand the logic now. Thanks.

I guess this is another case where we're haunted by the ANSI committee's
failure to define both and "unknown" and a "not applicable" value instead of
the unitary NULL.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#6Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#3)
Re: Inconsistent behavior on Array & Is Null?

Josh Berkus <josh@agliodbs.com> writes:

Joe,

This is correct. There are no dimensions to an empty array by
definition. The only other way to handle this would be an ERROR. I
followed the lead of (the pre-existing function) array_dims() when
creating array_upper() and array_lower().

What about a 0? That seems more consistent to me. If the array is empty,
its dimensions are not "NULL", meaning "unknown", but in fact zero elements,
which is a known value. The way it works now, array_upper on a NULL array
produces the same results as array_upper on an empty-but-non-null array.

Or is there some concept I'm missing?

I would certainly second that. Consider all that making it NULL breaks:

length(a) != array_upper(a)-array_lower(a)

array_upper(a||b) == array_upper(a)+length(b)

If someone implements pop and push it sure makes things weird that push
doesn't always increment the length pop doesn't decrement the length until 0.

Perhaps you're worried that you have pick an arbitrary lower and upper bound
and, strangely, that the upper bound would actually be one less than the lower
bound such as [1,0]. However this isn't really any different than the normal
case. All arrays in postgres have arbitrary lower bounds.

Fwiw, the int_aggregate function from int_aggregate.c in the contrib section
makes arrays that don't violate these invariants. For empty arrays the
dimensions are [0,-1].

This isn't hypothetical for me. Whenever I end up replacing int_aggregate with
a standard function that constructs your style arrays my app would break. I
guess I'll have to add a coalesce(...,0) to my array_length() function to work
around it. Which only seems like strong evidence it's the wrong behaviour.

--
greg

#7Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#6)
Re: Inconsistent behavior on Array & Is Null?

Greg Stark wrote:

length(a) != array_upper(a)-array_lower(a)

[You really meant "array_upper(a) - array_lower(a) + 1" I'd guess]

length(A) is a missing function at the moment; the spec actually calls
it CARDINALITY. Once available, you would use it to determine array
length. SQL2003 says:

The result of <cardinality expression> is the number of elements of
the result of the <collection value expression>.

So, when A is an empty array, CARDINALITY(A) = 0, by definition.

array_upper(a||b) == array_upper(a)+length(b)

Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec
defines a || b as NULL. See section 6.35:

2) If <array concatenation> is specified, then let AV1 be the value of
<array value expression 1> and let AV2 be the value of
<array primary>.
Case:
a) If either AV1 or AV2 is the null value, then the result of the
<array concatenation> is the null value.
b) If the sum of the cardinality of AV1 and the cardinality of AV2
is greater than IMDC, then an exception condition is raised:
data exception � array data, right truncation.
c) Otherwise, the result is the array comprising every element of
AV1 followed by every element of AV2.

If someone implements pop and push it sure makes things weird that push
doesn't always increment the length pop doesn't decrement the length until 0.

I have no idea what you're trying to say here. Current behavior
certainly increments length by one when you push an element (which is
what "array || element" effectively does). An empty array has length 0
before pushing an element on to it, and length 1 afterward. Pushing an
element onto a NULL array yields NULL, which is not explicitly defined
by the spec (that I can find), but is certainly consistent with the above.

As far as array_pop is concerned, we discussed the fact that it makes no
sense in the context of Postgres arrays -- see the archives from last
year in May.

Perhaps you're worried that you have pick an arbitrary lower and upper bound
and, strangely, that the upper bound would actually be one less than the lower
bound such as [1,0]. However this isn't really any different than the normal
case. All arrays in postgres have arbitrary lower bounds.

Again, I have no idea what you mean here.

Fwiw, the int_aggregate function from int_aggregate.c in the contrib section
makes arrays that don't violate these invariants. For empty arrays the
dimensions are [0,-1].

Seems rather arbitrary to me. As I said to Josh, an empty array has
undefined bounds, literally.

This isn't hypothetical for me. Whenever I end up replacing int_aggregate with
a standard function that constructs your style arrays my app would break. I
guess I'll have to add a coalesce(...,0) to my array_length() function to work
around it. Which only seems like strong evidence it's the wrong behaviour.

Sorry, but these are not "my style" arrays, they are "Berkley style" ;-).

Anyway, CARDINALITY is what you really need -- hopefully I'll be able to
find time to address that and some ather array items before the 7.5
freeze. In the meantime, if you have a custom array_length function
already, why not make it return 0 for empty arrays -- then your problems
disappear:

create or replace function array_length(anyarray)
returns int as '
select
case
when $1 = ''{}'' then 0
else array_upper($1, 1) - array_lower($1, 1) + 1
end
' language sql;

CREATE FUNCTION
regression=# select array_length(array[1,2,3]);
array_length
--------------
3
(1 row)

regression=# select array_length('{}'::int4[]);
array_length
--------------
0
(1 row)

regression=# select array[1,2,3] || '{}'::int4[];
?column?
----------
{1,2,3}
(1 row)

regression=# select array_upper(array[1,2,3], 1) +
array_length('{}'::int4[]);
?column?
----------
3
(1 row)

HTH,

Joe

#8Bruce Momjian
bruce@momjian.us
In reply to: Joe Conway (#7)
Re: Inconsistent behavior on Array & Is Null?

Joe Conway <mail@joeconway.com> writes:

Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec defines a
|| b as NULL. See section 6.35:

Why are you talking about when a is NULL? The original question was for when a
was an empty array. That's not an unknown value, it's known to be an array
containing no elements.

It sounds like this is the same type of confusion that led to Oracle treating
empty strings as NULL, which causes no end of headaches.

Anyway, CARDINALITY is what you really need -- hopefully I'll be able to find
time to address that and some ather array items before the 7.5 freeze. In the
meantime, if you have a custom array_length function already, why not make it
return 0 for empty arrays -- then your problems disappear:

create or replace function array_length(anyarray)
returns int as '
select
case
when $1 = ''{}'' then 0
else array_upper($1, 1) - array_lower($1, 1) + 1
end
' language sql;

My argument was that having to write a special case here makes it pretty clear
the idea of equating {} with NULL is the wrong interface.

--
greg

#9Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#8)
Re: Inconsistent behavior on Array & Is Null?

Greg Stark wrote:

Joe Conway <mail@joeconway.com> writes:

Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec defines a
|| b as NULL. See section 6.35:

Why are you talking about when a is NULL? The original question was for when a
was an empty array. That's not an unknown value, it's known to be an array
containing no elements.

Did you even look at my examples at the end of the post? I showed your
example, with an empty array, handled correctly. The mention of a NULL
array was only for completeness.

It sounds like this is the same type of confusion that led to Oracle treating
empty strings as NULL, which causes no end of headaches.

ISTM that you're the one who's confused. There is a very clear
distinction between a NULL array and an empty array in the present
implementation. They are *not* treated the same:

regression=# select '{}'::int[], NULL::int[];
int4 | int4
------+------
{} |
(1 row)

regression=# select '{}'::int[] is NULL, NULL::int[] is NULL;
?column? | ?column?
----------+----------
f | t
(1 row)

You seem to be saying that because the output of certain functions that
operate on empty arrays is NULL, it somehow implies that the array is
being treated as NULL -- that's just plain incorrect.

My argument was that having to write a special case here makes it pretty clear
the idea of equating {} with NULL is the wrong interface.

But they're not being equated (see above), so I don't see where there's
an issue.

Joe

#10Bruce Momjian
bruce@momjian.us
In reply to: Joe Conway (#9)
Re: Inconsistent behavior on Array & Is Null?

Joe Conway <mail@joeconway.com> writes:

You seem to be saying that because the output of certain functions that operate
on empty arrays is NULL, it somehow implies that the array is being treated as
NULL -- that's just plain incorrect.

Not incorrect, but a sign something weird is going on. It should only happen
if the certain functions really are looking for some property that isn't known
for that peculiar value of their parameters. iscomputable(n) perhaps, sqrt(-1)
if we abuse the rules a bit. But array_upper for a non-null array?

array_lower() and array_upper() are returning NULL for a non-null input, the
empty array, even though lower and upper bounds are known just as well as they
are for any other sized array. They are behaving as if there's something
unknown about the empty array that makes it hard to provide a lower bound or
upper bound.

slo=> select array_lower('{}'::int[],1), array_upper('{}'::int[],1);
array_lower | array_upper
-------------+-------------
|
(1 row)

I know it's possible to work around this special case, but I'm saying it's odd
to have an irregularity in the interface. What justification is there for
breaking the invariant length = upper-lower+1 ?

Yes I read the examples you gave, but you a) had to work around the nit with a
special case in your function and b) still have corner cases where one of the
invariants I named fails, namely:

Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec
defines a || b as NULL. See section 6.35:

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select array[1,2] as a, array[1,2] as b) as x;
array_upper | ?column?
-------------+----------
4 | 4
(1 row)

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as b) as x;
array_upper | ?column?
-------------+----------
2 |
(1 row)

--
greg

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: Inconsistent behavior on Array & Is Null?

Greg Stark <gsstark@mit.edu> writes:

array_lower() and array_upper() are returning NULL for a non-null
input, the empty array, even though lower and upper bounds are known
just as well as they are for any other sized array. They are behaving
as if there's something unknown about the empty array that makes it
hard to provide a lower bound or upper bound.

Hm. My idea of the index lower bound is "the smallest index for which
there is an array member" --- so I agree with Joe that it's not very
well defined for an empty array. We could pick some arbitrary value,
such as zero, for the LB and UB to be reported for an empty array, but
I can't see any way that you could justify them except as arbitrary
choices.

I think that maybe we ought to question these two properties:
* empty array is different from NULL ... really? Why?
* storing a value into an element of a NULL array yields
a NULL array instead of a singleton array.
IIRC the latter is presently true because we couldn't figure out
just what dimensionality to assign, but it might be easier to agree on
that than to resolve these other arguments...

regards, tom lane

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: Inconsistent behavior on Array & Is Null?

Tom Lane <tgl@sss.pgh.pa.us> writes:

Hm. My idea of the index lower bound is "the smallest index for which
there is an array member" --- so I agree with Joe that it's not very
well defined for an empty array.

Hm. The problem is that they have other implications. Like the upper bound is
one less than the index an element added to the upper end will get, and
similarly the lower bound is one more than the index that would be assigned to
an element added on the low end.

Currently there is a lower bound and upper bound in the implementation even
for empty arrays. I have empty arrays in my table that have a lower bound of
0, and they behave slightly differently than arrays with lower bounds of 1.

I think that maybe we ought to question these two properties:
* empty array is different from NULL ... really? Why?
* storing a value into an element of a NULL array yields
a NULL array instead of a singleton array.

Well that breaks other things. Then lots of functions have to become
non-strict to work properly because they should have valid output when passed
null values. Ick.

I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1. Then array_lower and
array_upper become entirely unnecessary. Instead we just have array_length
which is exactly equivalent to my idea of array_upper.

--
greg

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: Inconsistent behavior on Array & Is Null?

Greg Stark <gsstark@mit.edu> writes:

Well that breaks other things. Then lots of functions have to become
non-strict to work properly because they should have valid output when passed
null values. Ick.
...
I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1.

That would break even more things, no?

On the other hand, it'd get rid of the problem that we presently face
with dump/restore of arrays that don't have lower bound 1. Because
pg_dump doesn't do anything to mark such values, they'll end up with
lower bound 1 after reload anyway. The fact that we haven't heard lots
of squawks about that suggests to me that not many people are using such
arrays at present ...

regards, tom lane

#14Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#10)
Re: Inconsistent behavior on Array & Is Null?

Greg Stark wrote:

array_lower() and array_upper() are returning NULL for a non-null input, the
empty array, even though lower and upper bounds are known just as well as they
are for any other sized array. They are behaving as if there's something
unknown about the empty array that makes it hard to provide a lower bound or
upper bound.

Sorry, but I still disagree. There *is* something unknown about the
lower and upper bound of an empty array because there are no bounds.

I know it's possible to work around this special case, but I'm saying it's odd
to have an irregularity in the interface. What justification is there for
breaking the invariant length = upper-lower+1 ?

I don't see the spec defined CARDINALITY as a workaround. It defines
length as the number of elements in the array. When the array is empty,
that value is clearly 0. Nothing strange about it.

Yes I read the examples you gave, but you a) had to work around the nit with a
special case in your function and b) still have corner cases where one of the
invariants I named fails, namely:

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as b) as x;
array_upper | ?column?
-------------+----------
2 |
(1 row)

OK, you got me with this corner case. But using what you described as
the result int_aggregate would give you in this case (-1), you would get
an even stranger answer (-1 + 2 = 1) that would still need to be worked
around.

Joe

#15Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#11)
Re: Inconsistent behavior on Array & Is Null?

Tom Lane wrote:

I think that maybe we ought to question these two properties:
* empty array is different from NULL ... really? Why?

I think this makes sense, similar to the difference between '' and NULL.

* storing a value into an element of a NULL array yields
a NULL array instead of a singleton array.

Same argument. If you think of text as an array or characters, you get
this analogy (sort of):

regression=# create table s1(f1 int, f2 text);
CREATE TABLE
regression=# insert into s1 values(1, null);
INSERT 164679 1

regression=# select f1, substr(f2, 1, 1) is null from s1;
f1 | ?column?
----+----------
1 | t
(1 row)

regression=# update s1 set f2 = 'a' || substr(f2, 2);
UPDATE 1
regression=# select f1, substr(f2, 1, 1) is null from s1;
f1 | ?column?
----+----------
1 | t
(1 row)

Joe

#16Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#12)
Re: Inconsistent behavior on Array & Is Null?

Greg Stark wrote:

I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1. Then array_lower and
array_upper become entirely unnecessary. Instead we just have array_length
which is exactly equivalent to my idea of array_upper.

Now we finally have something to agree on ;-)

I do think this is the way to go, but it is a significant hit to
backward compatibility. Same is true for supporting NULL elements of
arrays -- maybe we should bite the bullet and make both changes at the
same time?

Joe

#17Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#13)
Re: Inconsistent behavior on Array & Is Null?

Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1.

That would break even more things, no?

On the other hand, it'd get rid of the problem that we presently face
with dump/restore of arrays that don't have lower bound 1. Because
pg_dump doesn't do anything to mark such values, they'll end up with
lower bound 1 after reload anyway. The fact that we haven't heard lots
of squawks about that suggests to me that not many people are using such
arrays at present ...

The more I think about it, the more I like it. Does everyone else agree
that a lower bound of 1 complies with the spec?

Joe

#18Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#13)
Re: Inconsistent behavior on Array & Is Null?

Tom Lane <tgl@sss.pgh.pa.us> writes:

That would break even more things, no?

On the other hand, it'd get rid of the problem that we presently face
with dump/restore of arrays that don't have lower bound 1. Because
pg_dump doesn't do anything to mark such values, they'll end up with
lower bound 1 after reload anyway. The fact that we haven't heard lots
of squawks about that suggests to me that not many people are using such
arrays at present ...

You have to be using not only arrays, but the new 7.4 functions provided to
manipulate them. In fact I think you have to be using array_prepend
specifically. But even there since it's not a mutator it's really not that
surprising that the elements of the brand new array it's returning should have
new indexes.

In fact I suspect there are more people with hidden bugs where they depend on
arrays starting at 1. This type of bug is insidious since it's hard to test
for, your application might never generate an array with a lower bound other
than 1 until someone adds some new code using array_prepend somewhere and all
of the sudden you get strange behaviours from unrelated code.

I can have the honour of being the first squawker like you describe, but my
problem was only evidence that having such non-normalized arrays at all was
surprising. I was using int_aggregate.c which generates non-standard arrays
with lower bounds of 0. My code assumed array_upper()+1 == length. After I
dumped and restored all my counts were off by one.

--
greg

#19Bruce Momjian
bruce@momjian.us
In reply to: Joe Conway (#16)
Re: Inconsistent behavior on Array & Is Null?

Joe Conway <mail@joeconway.com> writes:

Greg Stark wrote:

I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1. Then array_lower and
array_upper become entirely unnecessary. Instead we just have array_length
which is exactly equivalent to my idea of array_upper.

Now we finally have something to agree on ;-)

I do think this is the way to go, but it is a significant hit to backward
compatibility. Same is true for supporting NULL elements of arrays -- maybe we
should bite the bullet and make both changes at the same time?

In fact on further thought I think they *have* to be done together.

I forgot that your code did something else cool allowing updates to extend
arrays by directly updating elements outside the current bounds. Ie:

slo=> update test set a = '{}';
UPDATE 1
slo=> update test set a[1] = 1;
UPDATE 1
slo=> select * from test;
a
-----
{1}
(1 row)

But if we normalize array indexes to start at 1 then this makes it hard to
fill in the array starting at higher values. For example:

slo=> update test set a = '{}';
UPDATE 1
slo=> update test set a[5] = 5;
UPDATE 1
slo=> select a[5] from test;
a
---
5
(1 row)

This really ought to work, it obviously shouldn't allow you to set a[5] and
then surreptitiously move it to a[1]. But nor should it generate an error,
since I may well have a specific meaning for a[5] and may be planning to fill
in a[1]..a[4] later.

The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
null. This could be implemented by actually storing the NULLs or else storing
some notation that's used to adjust the base of the index to save space.

One thing that can't be made to work like it does now is extending the array
on the low end indefinitely:

slo=> update test set a[1] = 1;
UPDATE 1
slo=> update test set a[0] = 0;
UPDATE 1
slo=> update test set a[-1] = -1;
UPDATE 1
slo=> select * from test;
a
----------
{-1,0,1}
(1 row)

If this all looks familiar it's because Perl, and other languages, also behave
this way:

bash-2.05b$ perl -e '@a = (); $a[10]=10; print join(",",@a),"\n"'
,,,,,,,,,,10
bash-2.05b$ perl -e '@a = (); $a[-1]=-1; print join(",",@a),"\n"'
Modification of non-creatable array value attempted, subscript -1 at -e line 1.

--
greg

#20Bruce Momjian
bruce@momjian.us
In reply to: Joe Conway (#14)
Re: Inconsistent behavior on Array & Is Null?

Joe Conway <mail@joeconway.com> writes:

Greg Stark wrote:

array_lower() and array_upper() are returning NULL for a non-null input, the
empty array, even though lower and upper bounds are known just as well as they
are for any other sized array. They are behaving as if there's something
unknown about the empty array that makes it hard to provide a lower bound or
upper bound.

Sorry, but I still disagree. There *is* something unknown about the lower and
upper bound of an empty array because there are no bounds.

So I understand your point of view now. But I think the intuitive meaning here
for lower/upper bound as the lowest/highest index where an element is present
is only a part of the picture.

lower and upper bound are also related to other properties like where
array_prepend and array_append place things. And of course the array length.

So to give a practical example, say I was implementing a stack using an array.
I push things on by extending the array by storing in arr[array_upper(arr)+1].
(Incidentally I don't think we actually have any way to shorten an array, do
we?) As soon as I pop off the last element I lose my stack pointer. I need a
special case in my code to handle pushing elements in when the array is empty.

In reality array_append() would work fine. It's only array_upper() that's
going out of its way to make things weird. There's still an upper bound,
array_append() knows it, array_upper() just hides this value from the user.

I don't see the spec defined CARDINALITY as a workaround. It defines length as
the number of elements in the array. When the array is empty, that value is
clearly 0. Nothing strange about it.

The workaround is having to have that case handled with a special case if
statement. If array_lower(), array_upper(), array_length()/CARDINALITY are all
defined in a consistent way it doesn't seem like there ought to be any special
cases in the implementations. There should be a simple rigid mathematical
relationship between them. namely "upper-lower+1 = length"

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as b) as x;
array_upper | ?column? -------------+----------
2 | (1 row)

OK, you got me with this corner case. But using what you described as the
result int_aggregate would give you in this case (-1), you would get an even
stranger answer (-1 + 2 = 1) that would still need to be worked around.

No actually, 1 would be the correct answer, the original array would have
indexes ranging from [0,-1] and the new array would have indexes ranging from
[0,1], ie, two elements. The only strangeness is the unusual lower bound which
isn't the default for postgres arrays constructed from string literals.
Personally I prefer the zero-based indexes but as neither SQL-foo nor
backwards compatibility agree with me here I'll give that up as a lost cause
:)

--
greg

#21Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#19)
#22Bruce Momjian
bruce@momjian.us
In reply to: Joe Conway (#21)
#23Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Joe Conway (#23)