string_to_array, array_to_string function without separator

Started by Pavel Stehulealmost 7 years ago13 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

I propose mentioned functions without specified separator. In this case the
string is transformed to array of chars, in second case, the array of chars
is transformed back to string.

Comments, notes?

Regards

Pavel

#2David Fetter
david@fetter.org
In reply to: Pavel Stehule (#1)
Re: string_to_array, array_to_string function without separator

On Fri, Mar 15, 2019 at 05:04:02AM +0100, Pavel Stehule wrote:

Hi

I propose mentioned functions without specified separator. In this case the
string is transformed to array of chars, in second case, the array of chars
is transformed back to string.

Comments, notes?

Whatever optimizations you have in mind for this, could they also work
for string_to_array() and array_to_string() when they get an empty
string handed to them?

As to naming, some languages use explode/implode.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#2)
Re: string_to_array, array_to_string function without separator

pá 15. 3. 2019 v 15:03 odesílatel David Fetter <david@fetter.org> napsal:

On Fri, Mar 15, 2019 at 05:04:02AM +0100, Pavel Stehule wrote:

Hi

I propose mentioned functions without specified separator. In this case

the

string is transformed to array of chars, in second case, the array of

chars

is transformed back to string.

Comments, notes?

Whatever optimizations you have in mind for this, could they also work
for string_to_array() and array_to_string() when they get an empty
string handed to them?

my idea is use string_to_array('AHOJ') --> {A,H,O,J}

empty input means empty result --> {}

As to naming, some languages use explode/implode.

can be, but if we have string_to_array already, I am thinking so it is good
name.

Show quoted text

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#4Chapman Flack
chap@anastigmatix.net
In reply to: Pavel Stehule (#3)
Re: string_to_array, array_to_string function without separator

On 3/15/19 11:46 AM, Pavel Stehule wrote:

pá 15. 3. 2019 v 15:03 odesílatel David Fetter <david@fetter.org> napsal:

Whatever optimizations you have in mind for this, could they also work
for string_to_array() and array_to_string() when they get an empty
string handed to them?

my idea is use string_to_array('AHOJ') --> {A,H,O,J}

empty input means empty result --> {}

I thought the question was maybe about an empty /delimiter/ string.

It seems that string_to_array already has this behavior if NULL is
passed as the delimiter:

select string_to_array('AHOJ', null);

string_to_array
-----------------
{A,H,O,J}

and array_to_string has the proposed behavior if passed an
empty string as the delimiter (as one would naturally expect)
... but not null for a delimiter (that just makes the result null).

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

Does that sound right?

Regards,
-Chap

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Chapman Flack (#4)
Re: string_to_array, array_to_string function without separator

pá 15. 3. 2019 v 16:59 odesílatel Chapman Flack <chap@anastigmatix.net>
napsal:

On 3/15/19 11:46 AM, Pavel Stehule wrote:

pá 15. 3. 2019 v 15:03 odesílatel David Fetter <david@fetter.org>

napsal:

Whatever optimizations you have in mind for this, could they also work
for string_to_array() and array_to_string() when they get an empty
string handed to them?

my idea is use string_to_array('AHOJ') --> {A,H,O,J}

empty input means empty result --> {}

I thought the question was maybe about an empty /delimiter/ string.

It seems that string_to_array already has this behavior if NULL is
passed as the delimiter:

select string_to_array('AHOJ', null);

string_to_array
-----------------
{A,H,O,J}

and array_to_string has the proposed behavior if passed an
empty string as the delimiter (as one would naturally expect)
... but not null for a delimiter (that just makes the result null).

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

Does that sound right?

yes

Pavel

Show quoted text

Regards,
-Chap

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chapman Flack (#4)
Re: string_to_array, array_to_string function without separator

Chapman Flack <chap@anastigmatix.net> writes:

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

In that case why bother? It'll just create a cross-version compatibility
hazard for next-to-no keystroke savings. If the cases were so common
that they could be argued to be sane "default" behavior, I might feel
differently --- but if you were asked in a vacuum what the default
delimiters ought to be, I don't think you'd say "no delimiter".

regards, tom lane

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#6)
Re: string_to_array, array_to_string function without separator

pá 15. 3. 2019 v 17:16 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Chapman Flack <chap@anastigmatix.net> writes:

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

In that case why bother? It'll just create a cross-version compatibility
hazard for next-to-no keystroke savings. If the cases were so common
that they could be argued to be sane "default" behavior, I might feel
differently --- but if you were asked in a vacuum what the default
delimiters ought to be, I don't think you'd say "no delimiter".

My motivation is following - sometimes I need to convert string to array of
chars. Using NULL as separator is possible, but it is not intuitive. When
you use string_to_array function without separator, then only one possible
semantic is there - separation by chars.

I understand so there is a possible collision and possible meaning of
missing parameter like default value. But in this case this meaning,
semantic is not practical.

Regards

Pavel

Show quoted text

regards, tom lane

#8Chapman Flack
chap@anastigmatix.net
In reply to: Tom Lane (#6)
Re: string_to_array, array_to_string function without separator

On 3/15/19 12:15 PM, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

In that case why bother? It'll just create a cross-version compatibility
hazard for next-to-no keystroke savings. If the cases were so common
that they could be argued to be sane "default" behavior, I might feel
differently --- but if you were asked in a vacuum what the default
delimiters ought to be, I don't think you'd say "no delimiter".

One could go further and argue that the non-optional arguments improve
clarity: a reader seeing the explicit NULL or '' argument gets a strong
clue what's intended, who in the optional-argument case might end up
thinking "must go look up what this function's default delimiter is".

-Chap

#9Chapman Flack
chap@anastigmatix.net
In reply to: Pavel Stehule (#7)
Re: string_to_array, array_to_string function without separator

On 3/15/19 12:26 PM, Pavel Stehule wrote:

you use string_to_array function without separator, then only one possible
semantic is there - separation by chars.

Other languages can and do specify other semantics for the
separator-omitted case: often (as in Python) it means to split
around "runs of one or more characters the platform considers white
space", as a convenience, given that it's a fairly commonly wanted
meaning but can be tedious to spell out as an explicit separator.

I admit I think a separator of '' would be more clear than null,
so if I were designing string_to_array in a green field, I think
I would swap the meanings of null and '' as the delimiter: null
would mean "don't really split anything", and '' would mean "split
everywhere you can find '' in the string", that is, everywhere.

But the current behavior is already established....

Regards,
-Chap

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Chapman Flack (#9)
Re: string_to_array, array_to_string function without separator

pá 15. 3. 2019 v 17:54 odesílatel Chapman Flack <chap@anastigmatix.net>
napsal:

On 3/15/19 12:26 PM, Pavel Stehule wrote:

you use string_to_array function without separator, then only one

possible

semantic is there - separation by chars.

Other languages can and do specify other semantics for the
separator-omitted case: often (as in Python) it means to split
around "runs of one or more characters the platform considers white
space", as a convenience, given that it's a fairly commonly wanted
meaning but can be tedious to spell out as an explicit separator.

for this proposal "char" != byte

result[n] = substring(str FROM n FOR 1)

I admit I think a separator of '' would be more clear than null,
so if I were designing string_to_array in a green field, I think
I would swap the meanings of null and '' as the delimiter: null
would mean "don't really split anything", and '' would mean "split
everywhere you can find '' in the string", that is, everywhere.

But the current behavior is already established....

yes

Pavel

Show quoted text

Regards,
-Chap

#11Chapman Flack
chap@anastigmatix.net
In reply to: Pavel Stehule (#10)
Re: string_to_array, array_to_string function without separator

On 3/15/19 12:59 PM, Pavel Stehule wrote:

for this proposal "char" != byte

result[n] = substring(str FROM n FOR 1)

I think that's what string_to_array(..., null) already does:

SHOW server_encoding;
server_encoding
UTF8

WITH
t0(s) AS (SELECT text 'verlorn ist daz slüzzelîn'),
t1(a) AS (SELECT string_to_array(s, null) FROM t0)
SELECT
char_length(s), octet_length(convert_to(s, 'UTF8')),
array_length(a,1), a
FROM
t0, t1;

char_length|octet_length|array_length|a
25|27|25|{v,e,r,l,o,r,n," ",i,s,t," ",d,a,z," ",s,l,ü,z,z,e,l,î,n}

Regards,
-Chap

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Chapman Flack (#11)
Re: string_to_array, array_to_string function without separator

pá 15. 3. 2019 v 18:30 odesílatel Chapman Flack <chap@anastigmatix.net>
napsal:

On 3/15/19 12:59 PM, Pavel Stehule wrote:

for this proposal "char" != byte

result[n] = substring(str FROM n FOR 1)

I think that's what string_to_array(..., null) already does:

sure. My proposal is +/- just reduction about null parameter.

Show quoted text

SHOW server_encoding;
server_encoding
UTF8

WITH
t0(s) AS (SELECT text 'verlorn ist daz slüzzelîn'),
t1(a) AS (SELECT string_to_array(s, null) FROM t0)
SELECT
char_length(s), octet_length(convert_to(s, 'UTF8')),
array_length(a,1), a
FROM
t0, t1;

char_length|octet_length|array_length|a
25|27|25|{v,e,r,l,o,r,n," ",i,s,t," ",d,a,z," ",s,l,ü,z,z,e,l,î,n}

Regards,
-Chap

#13David Fetter
david@fetter.org
In reply to: Chapman Flack (#8)
Re: string_to_array, array_to_string function without separator

On Fri, Mar 15, 2019 at 12:31:21PM -0400, Chapman Flack wrote:

On 3/15/19 12:15 PM, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

In that case why bother? It'll just create a cross-version compatibility
hazard for next-to-no keystroke savings. If the cases were so common
that they could be argued to be sane "default" behavior, I might feel
differently --- but if you were asked in a vacuum what the default
delimiters ought to be, I don't think you'd say "no delimiter".

One could go further and argue that the non-optional arguments improve
clarity: a reader seeing the explicit NULL or '' argument gets a strong
clue what's intended, who in the optional-argument case might end up
thinking "must go look up what this function's default delimiter is".

Going to look up the function's behavior would be much more fun if
there were comments on these functions explaining things. I'll draft
up a patch for some of that.

In a similar vein, I haven't been able to come up with hazards of
naming function parameters in some document-ish way. What did I miss?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate