casting strings to multidimensional arrays yields strange results

Started by Kris Jurkaover 21 years ago23 messageshackersbugs
Jump to latest
#1Kris Jurka
books@ejurka.com
hackersbugs

Casting strings to multidimensional arrays yields strange results. In one
case there are discard values and the other a value magically appears.
Trying both of these with the array[] constructor syntax yields the
expected:
ERROR: multidimensional arrays must have array expressions with matching
dimensions

Tested on both 7.4.3 and 7.5dev.

Kris Jurka

jurka=# SELECT '{{1,2},{2,3},{4}}'::int[][];
int4
---------------
{{1},{2},{4}}

jurka=# SELECT '{{1},{2,3},{4,5}}'::int[][];
int4
---------------------
{{1,0},{2,3},{4,5}}

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kris Jurka (#1)
hackersbugs
Re: casting strings to multidimensional arrays yields strange results

Kris Jurka <books@ejurka.com> writes:

Casting strings to multidimensional arrays yields strange results.

array_in has fairly bizarre behavior when presented with non-rectangular
input data, such as your examples:

jurka=# SELECT '{{1,2},{2,3},{4}}'::int[][];

jurka=# SELECT '{{1},{2,3},{4,5}}'::int[][];

I don't recall the details right now of how it chooses the actual array
dimensions, but it's weird. I've been tempted to rewrite it but have
refrained for fear of breaking existing applications. Also, it's not
entirely clear what the behavior *should* be.

Right now I think the sanest behavior would be to throw an error on
non-rectangular input. Once we have support for null elements in
arrays, however, it would arguably be reasonable to pad with NULLs
where needed, so that the above would be read as

{{1,2},{2,3},{4,NULL}}

{{1,NULL},{2,3},{4,5}}

respectively. If that's the direction we want to head in, it would
probably be best to leave array_in alone until we can do that; users
tend to get unhappy when we change behavior repeatedly.

What's your thoughts?

regards, tom lane

#3Kris Jurka
books@ejurka.com
In reply to: Tom Lane (#2)
hackersbugs
Re: casting strings to multidimensional arrays yields strange

On Tue, 27 Jul 2004, Tom Lane wrote:

Right now I think the sanest behavior would be to throw an error on
non-rectangular input. Once we have support for null elements in
arrays, however, it would arguably be reasonable to pad with NULLs
where needed,

I'm just forwarding a report mentioned on irc so I have no real personal
interest. The user was really just trying to figure out how it was
supposed to work, rather than requesting a particular behavior.

Are you considering NULL padding arrays constructed with the
ARRAY[] syntax? If not then this should definitely throw an error to
match that. If we plan on moving to consistent NULL padding, perhaps now
we should consistently pad with 0 instead of sometimes padding and sometimes
truncating. This is a change along the direction we're going even if it
is an intermediate behavior.

Doing some testing along these lines for different data types makes me
think this might not be the best idea, 0 and '' seem like reasonable
defaults for numeric/text data, but for some reason 2000-01-01 is the
default for a date, and I'm sure other data types have similar problems.

jurka=# select
'{{2001-01-01},{2001-02-02,2003-03-03},{2004-02-02,2004-04-04}}'::date[][];
date
---------------------------------------------------------------------------
{{2001-01-01,2000-01-01},{2001-02-02,2003-03-03},{2004-02-02,2004-04-04}}

Kris Jurka

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kris Jurka (#3)
hackersbugs
Re: casting strings to multidimensional arrays yields strange results

Kris Jurka <books@ejurka.com> writes:

Are you considering NULL padding arrays constructed with the
ARRAY[] syntax?

Don't think anyone's really thought about it.

we should consistently pad with 0 instead of sometimes padding and sometimes
truncating.

"Pad with 0" is a meaningless concept as soon as you think about
nonnumeric data types. I'm not very sure what's even happening
inside the code --- it's a bit surprising it doesn't crash outright
on pass-by-reference data types ...

I'd agree that the truncation behavior is wrong, but I don't want to
get rid of it by causing the padding behavior to happen more often.

regards, tom lane

#5Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#2)
hackersbugs
Re: casting strings to multidimensional arrays yields strange

Tom Lane wrote:

Right now I think the sanest behavior would be to throw an error on
non-rectangular input. Once we have support for null elements in
arrays, however, it would arguably be reasonable to pad with NULLs
where needed, so that the above would be read as

{{1,2},{2,3},{4,NULL}}

{{1,NULL},{2,3},{4,5}}

respectively. If that's the direction we want to head in, it would
probably be best to leave array_in alone until we can do that; users
tend to get unhappy when we change behavior repeatedly.

I think that even once we support NULL array elements, they should be
explicitly requested -- i.e. throwing an error on non-rectangular input
is still the right thing to do. I haven't suggested that in the past
because of the backward-compatibility issue, but maybe now is the time
to bite the bullet.

If you think this qualifies as a bug fix for 7.5, I can take a look at
it next week.

Joe

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#5)
hackersbugs
Re: casting strings to multidimensional arrays yields strange results

[ cc'ing pghackers in case anyone wants to object ]

Joe Conway <mail@joeconway.com> writes:

Tom Lane wrote:

Right now I think the sanest behavior would be to throw an error on
non-rectangular input. Once we have support for null elements in
arrays, however, it would arguably be reasonable to pad with NULLs
where needed, so that the above would be read as

{{1,2},{2,3},{4,NULL}}

{{1,NULL},{2,3},{4,5}}

respectively. If that's the direction we want to head in, it would
probably be best to leave array_in alone until we can do that; users
tend to get unhappy when we change behavior repeatedly.

I think that even once we support NULL array elements, they should be
explicitly requested -- i.e. throwing an error on non-rectangular input
is still the right thing to do. I haven't suggested that in the past
because of the backward-compatibility issue, but maybe now is the time
to bite the bullet.

Okay with me. Anyone on pghackers not happy?

If you think this qualifies as a bug fix for 7.5, I can take a look at
it next week.

Yeah, we can call it a bug fix.

regards, tom lane

#7Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#6)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange

Tom Lane wrote:

[ cc'ing pghackers in case anyone wants to object ]
Joe Conway <mail@joeconway.com> writes:

I think that even once we support NULL array elements, they should be
explicitly requested -- i.e. throwing an error on non-rectangular input
is still the right thing to do. I haven't suggested that in the past
because of the backward-compatibility issue, but maybe now is the time
to bite the bullet.

Okay with me. Anyone on pghackers not happy?

If you think this qualifies as a bug fix for 7.5, I can take a look at
it next week.

Yeah, we can call it a bug fix.

The attached addresses the above issue as well as the ones mentioned in
my RFC post from yesterday found here:
http://archives.postgresql.org/pgsql-hackers/2004-08/msg00126.php

So whereas you used to get this:
test=# select '{{{1,2},{4,5,6}},{7,8}}'::int[];
int4
------
{}
(1 row)

you now get this:
test=# select '{{{1,2},{4,5,6}},{7,8}}'::int[];
ERROR: multidimensional arrays must have array expressions with
matching dimensions

Negative lower bound indicies now work also, and array_out will always
output explicit dimensions for an array with any dimension having a
lower bound of other than one. This allows the dimensions to be
preserved across pg_dump/reload cycles:

CREATE TABLE foo (
f1 integer[]
);
COPY foo (f1) FROM stdin;
[1:1][3:4][-2:0]={{{1,2,3},{4,5,6}}}
[-42:-41][12:14]={{1,2,3},{4,5,6}}
\.

test=# select f1, array_lower(f1, 1) as lb1, array_lower(f1, 2) as lb2,
array_lower(f1, 3) as lb3 from foo;
f1 | lb1 | lb2 | lb3
--------------------------------------+-----+-----+-----
[1:1][3:4][-2:0]={{{1,2,3},{4,5,6}}} | 1 | 3 | -2
[-42:-41][12:14]={{1,2,3},{4,5,6}} | -42 | 12 |
(2 rows)

If there are no objections, I'll adjust the appropriate regression
script/expected files and commit tonight. And of course I'll follow up
with documentation updates too.

Any thoughts on whether or not to apply this to 7.4?

Thanks,

Joe

Attachments:

array-fixes.1.patchtext/x-patch; name=array-fixes.1.patchDownload+102-50
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#7)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange results

Joe Conway <mail@joeconway.com> writes:

Negative lower bound indicies now work also, and array_out will always
output explicit dimensions for an array with any dimension having a
lower bound of other than one.

Seems reasonable to me.

Any thoughts on whether or not to apply this to 7.4?

I would say not; it's a sufficiently large change that it will doubtless
break a few people's applications. It's better to do such things at
major version revs. People don't expect to have to do application
compatibility testing on bug-fix updates. Also, the base bug (lack of
dimension info from array_out) has existed long enough that I don't
think there is great urgency about getting the fix into the field...

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#7)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange results

Joe Conway <mail@joeconway.com> writes:

If there are no objections, I'll adjust the appropriate regression
script/expected files and commit tonight. And of course I'll follow up
with documentation updates too.

BTW, I believe the plan is to wrap 8.0beta1 tomorrow, so please do get
this in tonight if you can get the docs updates done too. (Otherwise
it might be better to wait till the docs are done.)

regards, tom lane

#10Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#9)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

If there are no objections, I'll adjust the appropriate regression
script/expected files and commit tonight. And of course I'll follow up
with documentation updates too.

BTW, I believe the plan is to wrap 8.0beta1 tomorrow, so please do get
this in tonight if you can get the docs updates done too. (Otherwise
it might be better to wait till the docs are done.)

OK, will do. I ought to be able to finish it up today and commit tonight.

Thanks,

Joe

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kris Jurka (#1)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange results

Joe Conway <mail@joeconway.com> writes:

OK, this is done, but there is still an ugly bug in there (i.e. was
there prior to my just committed changes and still exists) somewhere
that I noticed while modifying domain.sql. Namely, if you forget to put
a delimiter inbetween subarrays in a multidimensional array literal, you
get odd results:

I had pretty much come to the conclusion that the array_in code should
be thrown away and rewritten from the ground up --- whoever wrote it
originally had no calling as a programmer :-(. I didn't look at the
details of your patch, but I have little faith in half measures in this
area. I've already wasted a lot of time trying to patch that code in
past releases.

regards, tom lane

#12Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#11)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange

Tom Lane wrote:

I had pretty much come to the conclusion that the array_in code should
be thrown away and rewritten from the ground up --- whoever wrote it
originally had no calling as a programmer :-(. I didn't look at the
details of your patch, but I have little faith in half measures in this
area. I've already wasted a lot of time trying to patch that code in
past releases.

Agreed, but I don't think I'll be able to get that done tonight.

While looking at it the last day or so, I started to think it might be
better to use bison to parse array literals -- or is that a bad idea?

In any case, I'm planning to find the time to work on NULL array
elements as soon as we branch 8.0 from head. At that time I'll also look
at cleaning up array_in (and friends undoubtedly).

Joe

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#12)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange results

Joe Conway <mail@joeconway.com> writes:

While looking at it the last day or so, I started to think it might be
better to use bison to parse array literals -- or is that a bad idea?

Offhand it doesn't seem like a super-appropriate tool. Once you get
past the lexical details like quoting, the syntax of array literals
is not complicated enough to need a bison parser. Also, the issues
you're facing now like enforcing consistent dimensions are not amenable
to solution by a context-free grammar --- so you'd still need most of
the dimension-checking mechanisms.

There might be something out there that is more useful, but I dunno
what.

regards, tom lane

#14Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#13)
hackersbugs
Re: [BUGS] casting strings to multidimensional arrays yields strange

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

While looking at it the last day or so, I started to think it might be
better to use bison to parse array literals -- or is that a bad idea?

Offhand it doesn't seem like a super-appropriate tool. Once you get
past the lexical details like quoting, the syntax of array literals
is not complicated enough to need a bison parser. Also, the issues
you're facing now like enforcing consistent dimensions are not amenable
to solution by a context-free grammar --- so you'd still need most of
the dimension-checking mechanisms.

I'm hesitant to apply the attached this late before the beta without
review, but it seems to take care of the pathological cases I came up
with, doesn't break anything AFAICS, and passes all regression tests. I
guess it can go into beta 2.

Joe

Attachments:

array-fixes-state.1.patchtext/x-patch; name=array-fixes-state.1.patchDownload+103-19
#15Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#14)
hackersbugs
Re: [PATCHES] [BUGS] casting strings to multidimensional arrays yields

Joe Conway wrote:

I'm hesitant to apply the attached this late before the beta without
review, but it seems to take care of the pathological cases I came up
with, doesn't break anything AFAICS, and passes all regression tests. I
guess it can go into beta 2.

I've continued to hack on array literal parsing today, and in doing so
came up with a question regarding behavior. Currently the following works:

regression=# select '{0 second,0 second}'::interval[];
interval
---------------------
{00:00:00,00:00:00}
(1 row)

However, since we don't require an element with embedded spaces to be
quoted, it also means that whitespace just before the delimiter is
significant (even though leading whitespace is not) because there is no
way to know when the element is complete:

regression=# select '{ 0 second ,0 second}'::text[];
text
----------------------------
{"0 second ","0 second"}
(1 row)

I view the current behavior as a bug. While making changes, I'd like to
require elements with embedded whitespace to be quoted so that the above
does this:

regression=# select '{0 second,0 second}'::text[];
ERROR: malformed array literal: "{0 second,0 second}"

Additionally I'd like to make whitespace before and after quoted or
unquoted elements insignificant. Any comments?

Thanks,

Joe

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#15)
hackersbugs
Re: [PATCHES] [BUGS] casting strings to multidimensional arrays yields strange

Joe Conway <mail@joeconway.com> writes:

... whitespace just before the delimiter is
significant (even though leading whitespace is not)

Yeah. This has been the documented behavior for quite some time, but
I can't say that I ever liked it.

I view the current behavior as a bug. While making changes, I'd like to
require elements with embedded whitespace to be quoted so that the above
does this:

regression=# select '{0 second,0 second}'::text[];
ERROR: malformed array literal: "{0 second,0 second}"

Additionally I'd like to make whitespace before and after quoted or
unquoted elements insignificant. Any comments?

I think that suppressing unquoted trailing whitespace is probably
reasonable. I'm much less enthusiastic about rejecting unquoted
embedded whitespace, though. I think that's significantly likely
to break apps, and it doesn't seem like it's really needed to have
sane behavior. The leading/trailing whitespace asymmetry is just
weird, but it doesn't follow that embedded whitespace is weird.

If we are going to make such changes, 8.0 is the right time to do it.

regards, tom lane

#17Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#16)
hackersbugs
Re: [PATCHES] [BUGS] casting strings to multidimensional arrays yields

Tom Lane wrote:

I think that suppressing unquoted trailing whitespace is probably
reasonable. I'm much less enthusiastic about rejecting unquoted
embedded whitespace, though. I think that's significantly likely
to break apps, and it doesn't seem like it's really needed to have
sane behavior. The leading/trailing whitespace asymmetry is just
weird, but it doesn't follow that embedded whitespace is weird.

If we are going to make such changes, 8.0 is the right time to do it.

The attached patch suppresses trailing whitespace, but allows embedded
whitespace in unquoted elements as discussed above. It also rejects some
previously accepted cases that were just too strange to be correct:

-- Postgres 8.0, with the patch
-- none of these should be accepted
select '{{1,{2}},{2,3}}'::text[];
ERROR: malformed array literal: "{{1,{2}},{2,3}}"
select '{{},{}}'::text[];
ERROR: malformed array literal: "{{},{}}"
select '{{1,2},\\{2,3}}'::text[];
ERROR: malformed array literal: "{{1,2},\{2,3}}"
select '{{"1 2" x},{3}}'::text[];
ERROR: malformed array literal: "{{"1 2" x},{3}}"

The third case above actually does get an ERROR in 7.4 but it looks
suspicious itself:

-- in Postgres 7.4
select '{{1,2},\\{2,3}}'::text[];
ERROR: malformed array literal: "{{1"

More examples:

-- Postgres 8.0, with the patch
-- all of these should be accepted
select '{}'::text[];
text
------
{}
(1 row)

select '{0 second ,0 second}'::interval[];
interval
---------------------
{00:00:00,00:00:00}
(1 row)

select '{ { "," } , { 3 } }'::text[];
text
-------------
{{","},{3}}
(1 row)

select ' { { " 0 second " , 0 second } }'::text[];
text
-------------------------------
{{" 0 second ","0 second"}}
(1 row)

If there are no objections, I'll update the docs and commit tomorrow
sometime.

Thanks,

Joe

Attachments:

array-fixes-state.2.patchtext/x-patch; name=array-fixes-state.2.patchDownload+164-74
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#17)
hackersbugs
Re: [PATCHES] [BUGS] casting strings to multidimensional arrays yields strange

Joe Conway <mail@joeconway.com> writes:

The attached patch suppresses trailing whitespace, but allows embedded
whitespace in unquoted elements as discussed above. It also rejects some
previously accepted cases that were just too strange to be correct:

-- Postgres 8.0, with the patch
-- none of these should be accepted
select '{{1,{2}},{2,3}}'::text[];
ERROR: malformed array literal: "{{1,{2}},{2,3}}"

Okay, uneven nesting level of {} is clearly bogus.

select '{{},{}}'::text[];
ERROR: malformed array literal: "{{},{}}"

Hm. This seems like it would be a legitimate representation of a 2x0
array. Why would you allow '{}' and reject this?

select '{{1,2},\\{2,3}}'::text[];
ERROR: malformed array literal: "{{1,2},\{2,3}}"

Okay, junk outside the {} structure is bad.

select '{{"1 2" x},{3}}'::text[];
ERROR: malformed array literal: "{{"1 2" x},{3}}"

So here you're rejecting the first data value because it's partially
quoted and partially not? I guess this is arguably reasonable, but
I'm not sure that it's really necessary either. Historically array_in
has taken this sort of thing.

The third case above actually does get an ERROR in 7.4 but it looks
suspicious itself:

select '{{1,2},\\{2,3}}'::text[];
ERROR: malformed array literal: "{{1"

This is something I was planning to fix myself: ReadArrayStr is using
arrayStr as the string to complain about in its error messages, but that
is the same string that it is scribbling on by overwriting with \0 where
it wants to terminate an individual data value. So you get bogus
messages anytime a syntax error is detected after having absorbed at
least one item value.

The easiest way to fix it seems to be for array_in to pass an additional
parameter which is the original non-overwritable input string, and use
that in the ereport calls.

More examples:

These look okay. Didn't study the code though.

regards, tom lane

#19Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#18)
hackersbugs
Re: [PATCHES] [BUGS] casting strings to multidimensional arrays yields

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

select '{{},{}}'::text[];
ERROR: malformed array literal: "{{},{}}"

Hm. This seems like it would be a legitimate representation of a 2x0
array. Why would you allow '{}' and reject this?

Well, '{}' is a special case representing an empty array. One of the
characteristics of an empty array is that is has 0 dimensions. An empty
array can later be treated as an array of any dimension, e.g.:

regression=# select '{}'::int[] || ARRAY[1];
?column?
----------
{1}
(1 row)

regression=# select '{}'::int[] || ARRAY[[1]];
?column?
----------
{{1}}
(1 row)

In my mind, '{{},{}}' clearly defines a two dimensional array, and
therefore needs elements, which in this case would have to be NULL (or
empty strings -- see below). Once we can deal with NULL elements, I'd
think '{{},{}}' ought to be accepted, and produce a 2d array with 2 NULL
elements. Note how this works in 7.4:

select '{{},{}}'::text[];
text
-------------
{{""},{""}}
(1 row)

I thought creation of empty strings was what we agreed the other day to
eliminate?

select '{{"1 2" x},{3}}'::text[];
ERROR: malformed array literal: "{{"1 2" x},{3}}"

So here you're rejecting the first data value because it's partially
quoted and partially not? I guess this is arguably reasonable, but
I'm not sure that it's really necessary either. Historically array_in
has taken this sort of thing.

I know is has accepted this, but I always found it surprising and
strange. And I believe we've had others complain about this behavior. It
seems to me that if you are going to the trouble to quote the item,
why would you want random garbage outside the quotes to get magically
appended?

The third case above actually does get an ERROR in 7.4 but it looks
suspicious itself:

select '{{1,2},\\{2,3}}'::text[];
ERROR: malformed array literal: "{{1"

This is something I was planning to fix myself: ReadArrayStr is using
arrayStr as the string to complain about in its error messages, but that
is the same string that it is scribbling on by overwriting with \0 where
it wants to terminate an individual data value. So you get bogus
messages anytime a syntax error is detected after having absorbed at
least one item value.

The easiest way to fix it seems to be for array_in to pass an additional
parameter which is the original non-overwritable input string, and use
that in the ereport calls.

Ah, thanks for the explanation. I'll fix that too.

Thanks,

Joe

#20Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#18)
hackersbugs
Re: [PATCHES] [BUGS] casting strings to multidimensional arrays yields

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

select '{{},{}}'::text[];
ERROR: malformed array literal: "{{},{}}"

Hm. This seems like it would be a legitimate representation of a 2x0
array. Why would you allow '{}' and reject this?

I just reread your comment and thought about it more. I think most of
what I said in the last post holds, but in addition part of the problem
lies in the fact that we don't implement multidimensional arrays as
nested arrays, but rather as one monolithic structure. If we did the
former, then 2x0 would be feasible.

Investigating, and possibly implementing, multidimensional arrays as a
nested structure is another item on my wish list. I'm still not entirely
sure how difficult or desirable it is, but my interpretation of SQL99 is
that nested arrays is the requirement.

Joe

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#19)
hackersbugs
#22Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#21)
hackersbugs
In reply to: Joe Conway (#20)
hackersbugs