Array behavior oddities
In reading our array documentation I came across two unusual behaviors.
The issue relates to slices:
We can also access arbitrary rectangular slices of an array, or
subarrays. An array slice is denoted by writing
<literal><replaceable>lower-bound</replaceable>:<replaceable>upper-bound</replaceable></literal>
for one or more array dimensions. For example, this query retrieves
the first item on Bill's schedule for the first two days of the week:
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
First issue:
If any dimension is written as a slice, i.e. contains a colon, then all
dimensions are treated as slices. Any dimension that has only a single
number (no colon) is treated as being from <literal>1</> to the number
specified. For example, <literal>[2]</> is treated as <literal>[1:2]</>,
as in this example:
Is the the behavior of assuming an entry with no colon is a slice what
we want, or are we just stuck with it?
Also:
An array subscript expression will return null if either the array itself
or any of the subscript expressions are null. Also, null is returned
if a subscript is outside the array bounds (this case does not raise an
error). For example, if <literal>schedule</> currently has the
dimensions <literal>[1:3][1:2]</> then referencing
<literal>schedule[3][3]</> yields NULL. Similarly, an array reference
with the wrong number of subscripts yields a null rather than an error.
An array slice expression likewise yields null if the array itself or
any of the subscript expressions are null. However, in other corner
cases such as selecting an array slice that is completely outside the
current array bounds, a slice expression yields an empty
(zero-dimensional) array instead of null. If the requested slice
partially overlaps the array bounds, then it is silently reduced to just
the overlapping region.
Is there a reason out-of-bounds array accesses behave differently for
slices and non-slices?
Having slices and non-slices behave differently is very confusing to me.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
If any dimension is written as a slice, i.e. contains a colon, then all
dimensions are treated as slices.
Is the the behavior of assuming an entry with no colon is a slice what
we want, or are we just stuck with it?
Why do you find that surprising? It's either a slice or it isn't,
there's no halfway point. Are you proposing to throw an error if only
some of the subscripts have colons? What would be the point?
Is there a reason out-of-bounds array accesses behave differently for
slices and non-slices?
History (although "sloppy original implementation" would do too). I'm
not sure if we should try to clean it up --- there've not been that many
complaints, but I'm sure we'd get complaints from people whose code
stopped working, if we change it.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
If any dimension is written as a slice, i.e. contains a colon, then all
dimensions are treated as slices.Is the the behavior of assuming an entry with no colon is a slice what
we want, or are we just stuck with it?Why do you find that surprising? It's either a slice or it isn't,
there's no halfway point. Are you proposing to throw an error if only
some of the subscripts have colons? What would be the point?
What is confusing is if I see [1:3][2]means one element., I assume that is [1:3][2:2], not
[1:3][1:2]. By what logic does [2]means one element. mean from 1 to 2?
For example, in [1:3][2]means one element., [2]means one element. means [1:2], two elements, but in [1][2]means one element.,
[2]: means one element.
Is there a reason out-of-bounds array accesses behave differently for
slices and non-slices?History (although "sloppy original implementation" would do too). I'm
not sure if we should try to clean it up --- there've not been that many
complaints, but I'm sure we'd get complaints from people whose code
stopped working, if we change it.
OK, so there is no grand design I am missing; it is just a wart in our
implementation, that at least we document.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes:
Tom Lane wrote:
Why do you find that surprising? It's either a slice or it isn't,
there's no halfway point. Are you proposing to throw an error if only
some of the subscripts have colons? What would be the point?What is confusing is if I see [1:3][2], I assume that is [1:3][2:2], not
[1:3][1:2]. By what logic does [2] mean from 1 to 2?
Here's something else which confused me just now. Why does the second query
return NULL instead of an array slice?
postgres=# select (array['{foo}'::text[],'{bar}'])[1][1];
array
-------
foo
(1 row)
postgres=# select (array['{foo}'::text[],'{bar}'])[1];
array
-------
(1 row)
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!
Gregory Stark <stark@enterprisedb.com> writes:
Here's something else which confused me just now. Why does the second query
return NULL instead of an array slice?
Because it isn't a slice expression --- you used colon nowhere, so the
result type is going to be text not text[]. (Remember that the parser
must determine the expression's result type at parse time, so whether
there are enough subscripts can't enter into this.) Our alternatives
here are to throw a subscripting error or return NULL. I'd personally
have gone with throwing an error, I think, but it seems far too late to
revisit that decision.
regards, tom lane
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Bruce Momjian wrote:
An array subscript expression will return null if either the array itself
or any of the subscript expressions are null. Also, null is returned
if a subscript is outside the array bounds (this case does not raise an
error). For example, if <literal>schedule</> currently has the
dimensions <literal>[1:3][1:2]</> then referencing
<literal>schedule[3][3]</> yields NULL. Similarly, an array reference
with the wrong number of subscripts yields a null rather than an error.An array slice expression likewise yields null if the array itself or
any of the subscript expressions are null. However, in other corner
cases such as selecting an array slice that is completely outside the
current array bounds, a slice expression yields an empty
(zero-dimensional) array instead of null. If the requested slice
partially overlaps the array bounds, then it is silently reduced to just
the overlapping region.Is there a reason out-of-bounds array accesses behave differently for
slices and non-slices?Having slices and non-slices behave differently is very confusing to me.
I think the case of partially-out-of-bound slices is a good reason to have
this difference:
fastgraph=# select ('{foo,bar}'::text[])[1:2];
text
- -----------
{foo,bar}
(1 row)
fastgraph=# select ('{foo,bar}'::text[])[2:3];
text
- -------
{bar}
(1 row)
fastgraph=# select ('{foo,bar}'::text[])[3:4];
text
- ------
{}
(1 row)
We cannot return an empty array in case of unsliced out-of-bounds access
because the type wouldn't match at all.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHjdsOzhchXT4RR5ARAvBvAKCGVxgl6u2ZUcB/Bvl2jPN2/p6hzACdFXE3
9w01URr/xPYukzHhD5qhudE=
=iZxq
-----END PGP SIGNATURE-----
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
If any dimension is written as a slice, i.e. contains a colon, then all
dimensions are treated as slices.Is the the behavior of assuming an entry with no colon is a slice what
we want, or are we just stuck with it?Why do you find that surprising? It's either a slice or it isn't,
there's no halfway point. Are you proposing to throw an error if only
some of the subscripts have colons? What would be the point?Is there a reason out-of-bounds array accesses behave differently for
slices and non-slices?History (although "sloppy original implementation" would do too). I'm
not sure if we should try to clean it up --- there've not been that many
complaints, but I'm sure we'd get complaints from people whose code
stopped working, if we change it.
I have updated the documention (patch attached) to be clearer on the odd
array slice behavior.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/rtmp/difftext/x-diffDownload
Index: doc/src/sgml/array.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/array.sgml,v
retrieving revision 1.62
diff -c -c -r1.62 array.sgml
*** doc/src/sgml/array.sgml 7 Jun 2007 14:49:56 -0000 1.62
--- doc/src/sgml/array.sgml 26 Mar 2008 14:42:14 -0000
***************
*** 258,263 ****
--- 258,266 ----
{{meeting,lunch},{training,presentation}}
(1 row)
</programlisting>
+
+ To avoid confusion with slices, use slice syntax for all dimmension
+ references, e.g. <literal>[1:2][1:1]</>, not <literal>[2][1:1]</>.
</para>
<para>
***************
*** 275,281 ****
any of the subscript expressions are null. However, in other corner
cases such as selecting an array slice that
is completely outside the current array bounds, a slice expression
! yields an empty (zero-dimensional) array instead of null.
If the requested slice partially overlaps the array bounds, then it
is silently reduced to just the overlapping region.
</para>
--- 278,285 ----
any of the subscript expressions are null. However, in other corner
cases such as selecting an array slice that
is completely outside the current array bounds, a slice expression
! yields an empty (zero-dimensional) array instead of null. (This
! does not match non-slice behavior and is done for historical reasons.)
If the requested slice partially overlaps the array bounds, then it
is silently reduced to just the overlapping region.
</para>