string_to_array with an empty input string

Started by Tom Laneover 15 years ago30 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Looking through Pavel's string_to_array patch, I notice that the new
version of string_to_array returns an empty (zero-element) array when
the input string is of zero length, whereas the traditional version
returned NULL instead. The patch fails to emulate the old behavior
exactly, but rather than fix it, I wonder if we shouldn't make the two
versions behave the same. (If we don't have them doing the same thing,
we're going to have to document the difference...) The NULL result
seems a bit peculiar to me, and the empty-array result saner. Given
the general inconsistency surrounding empty arrays elsewhere, I think
this wouldn't be a very problematic change for most users.

Thoughts?

regards, tom lane

#2Thom Brown
thom@linux.com
In reply to: Tom Lane (#1)
Re: string_to_array with an empty input string

On 10 August 2010 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Looking through Pavel's string_to_array patch, I notice that the new
version of string_to_array returns an empty (zero-element) array when
the input string is of zero length, whereas the traditional version
returned NULL instead.  The patch fails to emulate the old behavior
exactly, but rather than fix it, I wonder if we shouldn't make the two
versions behave the same.  (If we don't have them doing the same thing,
we're going to have to document the difference...)  The NULL result
seems a bit peculiar to me, and the empty-array result saner.  Given
the general inconsistency surrounding empty arrays elsewhere, I think
this wouldn't be a very problematic change for most users.

Thoughts?

I, personally, would expect an empty array output given an empty
input, and a null output for a null input.
--
Thom Brown
Registered Linux user: #516935

#3David E. Wheeler
david@kineticode.com
In reply to: Thom Brown (#2)
Re: string_to_array with an empty input string

On Aug 10, 2010, at 11:46 AM, Thom Brown wrote:

I, personally, would expect an empty array output given an empty
input, and a null output for a null input.

+1

David

In reply to: David E. Wheeler (#3)
Re: string_to_array with an empty input string

On 10 August 2010 19:48, David E. Wheeler <david@kineticode.com> wrote:

On Aug 10, 2010, at 11:46 AM, Thom Brown wrote:

I, personally, would expect an empty array output given an empty
input, and a null output for a null input.

+1

Agreed. After all, the result isn't indeterminate - it's an empty
array. Some people might think that it's useful for the result to be
NULL, but they'd probably also think that it's useful for an empty
string to be NULL.

--
Regards,
Peter Geoghegan

#5Bruce Momjian
bruce@momjian.us
In reply to: Peter Geoghegan (#4)
Re: string_to_array with an empty input string

On Wed, Aug 11, 2010 at 12:37 AM, Peter Geoghegan
<peter.geoghegan86@gmail.com> wrote:

On 10 August 2010 19:48, David E. Wheeler <david@kineticode.com> wrote:

On Aug 10, 2010, at 11:46 AM, Thom Brown wrote:

I, personally, would expect an empty array output given an empty
input, and a null output for a null input.

+1

Agreed. After all, the result isn't indeterminate - it's an empty
array. Some people might think that it's useful for the result to be
NULL, but they'd probably also think that it's useful for an empty
string to be NULL.

For what it's worth there are two reasonable return values for
string_to_array(''). It could be [] or it could be ['']. There are
applications where the former makes the most sense and there are
applications where the latter makes the most sense.

Ideally you really want string_to_array(array_to_string(x, ':'),':')
to return x. There's no safe return value to pick for the cases where
x=[''] and x=[] that will make this work.

--
greg

#6Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#5)
Re: string_to_array with an empty input string

There's already been one rather-long thread on this topic.

http://thread.gmane.org/gmane.comp.db.postgresql.general/121450

In there I argue for the empty array interpretation and Tom goes back
and forth a few times. I'm not sure where that thread ended though.

--
greg

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bruce Momjian (#5)
Re: string_to_array with an empty input string

2010/8/11 Greg Stark <gsstark@mit.edu>:

On Wed, Aug 11, 2010 at 12:37 AM, Peter Geoghegan
<peter.geoghegan86@gmail.com> wrote:

On 10 August 2010 19:48, David E. Wheeler <david@kineticode.com> wrote:

On Aug 10, 2010, at 11:46 AM, Thom Brown wrote:

I, personally, would expect an empty array output given an empty
input, and a null output for a null input.

+1

Agreed. After all, the result isn't indeterminate - it's an empty
array. Some people might think that it's useful for the result to be
NULL, but they'd probably also think that it's useful for an empty
string to be NULL.

For what it's worth there are two reasonable return values for
string_to_array(''). It could be [] or it could be ['']. There are
applications where the former makes the most sense and there are
applications where the latter makes the most sense.

you have a true. The safe solution is return NULL on empty string. But
this behave is pretty unpractical for all domains other than texts. On
numeric or date there are not possible described situation.

I have a two ideas, just ideas:

a) to create a text_to_array function as complement to string_to_array
function. This function is same as string_to_array, but empty string
can be a NULL. But I see it as too academical.

b) to create a functions "explode" for other than text domains. One
parameter can be a regtype of expected array (maybe element). Then we
can correctly to decide what is correct result for empty string, and
we can to safe a some memory/time because the result will not be a
short life text array but desired array.

explode('1,2,3,4,,5', ',', '', int[])

Similar function have to be implemented with parser and transformation
changes - so we can design this function more verbose if we want:

explode('1,2,3,4,5,,' TO int[] DELIMITER AS ',' NULL AS '')

delimiter and nullstr can be a optional.

c) do nothing and returns NULL for empty string :(

I like a variant b.

Regards

Pavel Stehule

Show quoted text

Ideally you really want string_to_array(array_to_string(x, ':'),':')
to return x. There's no safe return value to pick for the cases where
x=[''] and x=[] that will make this work.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: string_to_array with an empty input string

Greg Stark <gsstark@mit.edu> writes:

Ideally you really want string_to_array(array_to_string(x, ':'),':')
to return x. There's no safe return value to pick for the cases where
x=[''] and x=[] that will make this work.

It's easy to see that string_to_array/array_to_string are *not* usable
as general-purpose serialize/deserialize operations, so sweating over
corner cases like this one seems a bit pointless. The design center
for 'em seems to be array elements that are numbers, so there's no
issue with empty strings and no great difficulty in picking delimiters
and null representations that can't appear in the data. I think they're
essentially worthless for arrays of text.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: string_to_array with an empty input string

Greg Stark <gsstark@mit.edu> writes:

There's already been one rather-long thread on this topic.
http://thread.gmane.org/gmane.comp.db.postgresql.general/121450

In there I argue for the empty array interpretation and Tom goes back
and forth a few times. I'm not sure where that thread ended though.

I had forgotten that discussion. It looks like we trailed off without
any real consensus: there was about equal sentiment for an array with
zero elements and an array with one empty-string element. We ended
up leaving it alone because (a) that wouldn't break anything and (b)
you could use COALESCE() to substitute whichever behavior your
application needed for the case.

So maybe we need to revisit the issue. Pavel was claiming that
switching to a zero-element array result was a no-brainer, but evidently
it isn't so. Is anybody still excited about the alternatives?

regards, tom lane

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: string_to_array with an empty input string

On Wed, Aug 11, 2010 at 10:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

Ideally you really want string_to_array(array_to_string(x, ':'),':')
to return x. There's no safe return value to pick for the cases where
x=[''] and x=[] that will make this work.

It's easy to see that string_to_array/array_to_string are *not* usable
as general-purpose serialize/deserialize operations, so sweating over
corner cases like this one seems a bit pointless.  The design center
for 'em seems to be array elements that are numbers, so there's no
issue with empty strings and no great difficulty in picking delimiters
and null representations that can't appear in the data.  I think they're
essentially worthless for arrays of text.

array_to_string() is quite useful for arrays of text; I use it to
generate human-readable output, by setting the delimiter to ', '.

Whether string_to_array() is useful is another matter. It probably is
in some cases, but putting parsing logic into the database layer tends
to be a bit klunky, unless you know from context that you needn't
worry about the error cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#11David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#9)
Re: string_to_array with an empty input string

On Aug 11, 2010, at 7:41 AM, Tom Lane wrote:

I had forgotten that discussion. It looks like we trailed off without
any real consensus: there was about equal sentiment for an array with
zero elements and an array with one empty-string element. We ended
up leaving it alone because (a) that wouldn't break anything and (b)
you could use COALESCE() to substitute whichever behavior your
application needed for the case.

So maybe we need to revisit the issue. Pavel was claiming that
switching to a zero-element array result was a no-brainer, but evidently
it isn't so. Is anybody still excited about the alternatives?

% perl -E 'say q{"}, join(",", ""), q{"}'
""
% ruby -e 'puts %q{"} + [""].join(",") + %q{"}'
""
% python -c 'print "\"" + ",".join([""]) + "\""'
""

I believe those are all "", rather than '"' + undef + '"'.

Best,

David

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#11)
Re: string_to_array with an empty input string

"David E. Wheeler" <david@kineticode.com> writes:

On Aug 11, 2010, at 7:41 AM, Tom Lane wrote:

So maybe we need to revisit the issue. Pavel was claiming that
switching to a zero-element array result was a no-brainer, but evidently
it isn't so. Is anybody still excited about the alternatives?

% perl -E 'say q{"}, join(",", ""), q{"}'
""
% ruby -e 'puts %q{"} + [""].join(",") + %q{"}'
""
% python -c 'print "\"" + ",".join([""]) + "\""'
""

I believe those are all "", rather than '"' + undef + '"'.

If you believe my previous opinion that the design center for these
functions is arrays of numbers, then a zero-entry text[] array is what
you want, because you can successfully cast it to a zero-entry array of
integers or floats or whatever. Returning a single empty string will
make those cases fail. So at the moment I'm on the side of the fence
that says zero-entry array is the best answer.

regards, tom lane

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#12)
Re: string_to_array with an empty input string

On Wed, Aug 11, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David E. Wheeler" <david@kineticode.com> writes:

On Aug 11, 2010, at 7:41 AM, Tom Lane wrote:

So maybe we need to revisit the issue.  Pavel was claiming that
switching to a zero-element array result was a no-brainer, but evidently
it isn't so.  Is anybody still excited about the alternatives?

% perl -E 'say q{"}, join(",", ""), q{"}'
""
% ruby -e 'puts %q{"} + [""].join(",") + %q{"}'
""
% python -c 'print "\"" + ",".join([""]) + "\""'
""

I believe those are all "", rather than '"' + undef + '"'.

If you believe my previous opinion that the design center for these
functions is arrays of numbers, then a zero-entry text[] array is what
you want, because you can successfully cast it to a zero-entry array of
integers or floats or whatever.  Returning a single empty string will
make those cases fail.  So at the moment I'm on the side of the fence
that says zero-entry array is the best answer.

Yeah, I think David's examples are talking about the behavior of join,
but we're trying to decide what split should do. I think the main
argument for making it return NULL is that you can then fairly easily
use COALESCE() to get whatever you want. That's a bit more difficult
if you use return any other value. But I think your argument that an
empty array is better than a one-element array containing an empty
string is very much correct, as between those options.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In reply to: Robert Haas (#13)
Re: string_to_array with an empty input string

Yeah, I think David's examples are talking about the behavior of join,
but we're trying to decide what split should do.  I think the main
argument for making it return NULL is that you can then fairly easily
use COALESCE() to get whatever you want.  That's a bit more difficult
if you use return any other value.

I think that there's a need for additional built-in array functions,
including one to succinctly test if an array has no elements.
Iterating through an array with plpgsql, for example, is more clunky
than it should be.

--
Regards,
Peter Geoghegan

#15David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#12)
Re: string_to_array with an empty input string

On Aug 11, 2010, at 9:36 AM, Tom Lane wrote:

I believe those are all "", rather than '"' + undef + '"'.

If you believe my previous opinion that the design center for these
functions is arrays of numbers, then a zero-entry text[] array is what
you want, because you can successfully cast it to a zero-entry array of
integers or floats or whatever. Returning a single empty string will
make those cases fail. So at the moment I'm on the side of the fence
that says zero-entry array is the best answer.

Seems to be precedent for that:

% perl -E 'say scalar @{[ split ",", ""]}'
0

Best,

David

#16David E. Wheeler
david@kineticode.com
In reply to: Robert Haas (#13)
Re: string_to_array with an empty input string

On Aug 11, 2010, at 9:40 AM, Robert Haas wrote:

Yeah, I think David's examples are talking about the behavior of join,
but we're trying to decide what split should do.

Right, sorry about that.

I think the main
argument for making it return NULL is that you can then fairly easily
use COALESCE() to get whatever you want. That's a bit more difficult
if you use return any other value. But I think your argument that an
empty array is better than a one-element array containing an empty
string is very much correct, as between those options.

I prefer an empty array.

Best,

David

#17Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#14)
Re: string_to_array with an empty input string

On Wed, Aug 11, 2010 at 1:32 PM, Peter Geoghegan
<peter.geoghegan86@gmail.com> wrote:

Yeah, I think David's examples are talking about the behavior of join,
but we're trying to decide what split should do.  I think the main
argument for making it return NULL is that you can then fairly easily
use COALESCE() to get whatever you want.  That's a bit more difficult
if you use return any other value.

I think that there's a need for additional built-in array functions,
including one to succinctly test if an array has no elements.

What do you propose? I think the easiest ways to do it right now are:

array_length(arr, 1) is null

or just using an equality test, like this:

arr = '{}'::int[]

Iterating through an array with plpgsql, for example, is more clunky
than it should be.

Really?

FOR var IN SELECT UNNEST(arr) LOOP ... END LOOP

I mean, doing everything is sort of clunky in PL/pgsql, but this
doesn't seem particularly bad as PL/pgsql idioms go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#18David E. Wheeler
david@kineticode.com
In reply to: Robert Haas (#17)
Re: string_to_array with an empty input string

On Aug 11, 2010, at 10:53 AM, Robert Haas wrote:

Iterating through an array with plpgsql, for example, is more clunky
than it should be.

Really?

FOR var IN SELECT UNNEST(arr) LOOP ... END LOOP

I mean, doing everything is sort of clunky in PL/pgsql, but this
doesn't seem particularly bad as PL/pgsql idioms go.

That tends to over-flatten if you have nested arrays and just want to iterate over the top level. In that case you must use generate_subscripts().

Best,

David

#19Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#18)
Re: string_to_array with an empty input string

On 08/11/2010 01:54 PM, David E. Wheeler wrote:

On Aug 11, 2010, at 10:53 AM, Robert Haas wrote:

Iterating through an array with plpgsql, for example, is more clunky
than it should be.

Really?

FOR var IN SELECT UNNEST(arr) LOOP ... END LOOP

I mean, doing everything is sort of clunky in PL/pgsql, but this
doesn't seem particularly bad as PL/pgsql idioms go.

That tends to over-flatten if you have nested arrays and just want to iterate over the top level. In that case you must use generate_subscripts().

for i in array_lower(myarray,1) .. array_upper(myarray,1) loop ...

works well

cheers

andrew

#20David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#19)
Re: string_to_array with an empty input string

On Aug 11, 2010, at 10:58 AM, Andrew Dunstan wrote:

for i in array_lower(myarray,1) .. array_upper(myarray,1) loop ...

works well

for i in select array_subscripts(myarray, 1) loop ...

Best,

David

#21Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#20)
#22David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#21)
#23Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#22)
In reply to: Robert Haas (#17)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#24)
In reply to: Robert Haas (#26)
#28Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#17)
#29Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#26)
#30Pavel Stehule
pavel.stehule@gmail.com
In reply to: Dimitri Fontaine (#29)