split_part for the last element

Started by Nikhil Beneschover 5 years ago7 messagesgeneral
Jump to latest
#1Nikhil Benesch
nikhil.benesch@gmail.com

Hi,

Suppose I need to split a string on a delimiter and select one of the
resulting components. If I want a specific component counting from the
start, that's easy:

split_part('foo bar baz', ' ', 1) -> 'foo'

But if I want the last component, I have several less-than-ideal options:

1. (string_to_array('foo bar baz', '
'))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
3. (regexp_match('foo baz bar', '\S*$'))[1]

Option 1 is probably the most understandable, especially if you are
willing to introduce a temporary parts array:

select parts[cardinality(parts) - 1] from string_to_array('foo bar
baz', ' ') parts

But if the strings are long, this needlessly builds an array just to
throw it away. Option 2 has similar efficiency problems and is just
kind of silly. Option 3 is probably the best, but it's still a good
bit more complicated than a simple split_part invocation.

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

split_part('foo bar baz', ' ', -1) -> 'baz'

Or adding a split_part_end function in which positive indices counted
from the end:

split_part_end('foo bar baz', ' ', 1) -> 'baz'

I'd be happy to prepare a patch if so.

Cheers,
Nikhil

#2PALAYRET Jacques
jacques.palayret@meteo.fr
In reply to: Nikhil Benesch (#1)
Re: split_part for the last element

Hello,

reverse(split_part(reverse('foo bar baz'), ' ', 1)) -> 'baz'

Regards

----- Mail original -----
De: "Nikhil Benesch" <nikhil.benesch@gmail.com>
À: pgsql-general@lists.postgresql.org
Envoyé: Vendredi 23 Octobre 2020 17:47:16
Objet: split_part for the last element

Hi,

Suppose I need to split a string on a delimiter and select one of the
resulting components. If I want a specific component counting from the
start, that's easy:

split_part('foo bar baz', ' ', 1) -> 'foo'

But if I want the last component, I have several less-than-ideal options:

1. (string_to_array('foo bar baz', '
'))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
3. (regexp_match('foo baz bar', '\S*$'))[1]

Option 1 is probably the most understandable, especially if you are
willing to introduce a temporary parts array:

select parts[cardinality(parts) - 1] from string_to_array('foo bar
baz', ' ') parts

But if the strings are long, this needlessly builds an array just to
throw it away. Option 2 has similar efficiency problems and is just
kind of silly. Option 3 is probably the best, but it's still a good
bit more complicated than a simple split_part invocation.

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

split_part('foo bar baz', ' ', -1) -> 'baz'

Or adding a split_part_end function in which positive indices counted
from the end:

split_part_end('foo bar baz', ' ', 1) -> 'baz'

I'd be happy to prepare a patch if so.

Cheers,
Nikhil

#3Nikhil Benesch
nikhil.benesch@gmail.com
In reply to: PALAYRET Jacques (#2)
Re: split_part for the last element

Right, that's option 2 in my original mail. There are several
deficiencies with that idiom:

* It is non-obvious. Sure, it might make sense to you and I, but to
someone just learning SQL, it takes a minute to reason through why it
works. They're also unlikely to invent the trick on their own.
* It is inefficient. When the strings are large reversing the
strings is a silly waste of compute.

On Fri, Oct 23, 2020 at 12:03 PM PALAYRET Jacques
<jacques.palayret@meteo.fr> wrote:

Show quoted text

Hello,

reverse(split_part(reverse('foo bar baz'), ' ', 1)) -> 'baz'

Regards

----- Mail original -----
De: "Nikhil Benesch" <nikhil.benesch@gmail.com>
À: pgsql-general@lists.postgresql.org
Envoyé: Vendredi 23 Octobre 2020 17:47:16
Objet: split_part for the last element

Hi,

Suppose I need to split a string on a delimiter and select one of the
resulting components. If I want a specific component counting from the
start, that's easy:

split_part('foo bar baz', ' ', 1) -> 'foo'

But if I want the last component, I have several less-than-ideal options:

1. (string_to_array('foo bar baz', '
'))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
3. (regexp_match('foo baz bar', '\S*$'))[1]

Option 1 is probably the most understandable, especially if you are
willing to introduce a temporary parts array:

select parts[cardinality(parts) - 1] from string_to_array('foo bar
baz', ' ') parts

But if the strings are long, this needlessly builds an array just to
throw it away. Option 2 has similar efficiency problems and is just
kind of silly. Option 3 is probably the best, but it's still a good
bit more complicated than a simple split_part invocation.

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

split_part('foo bar baz', ' ', -1) -> 'baz'

Or adding a split_part_end function in which positive indices counted
from the end:

split_part_end('foo bar baz', ' ', 1) -> 'baz'

I'd be happy to prepare a patch if so.

Cheers,
Nikhil

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Nikhil Benesch (#1)
Re: split_part for the last element

On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com>
wrote:

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

split_part('foo bar baz', ' ', -1) -> 'baz'

Some thoughts:

I'm torn here because this would be the first usage of this concept in
PostgreSQL (I think).

Tangentially, I noticed that we have a "starts_with" function but no
corresponding "end_with".

It's been a while but there used to be a systemic inertia working
against adding minor useful functions such as these.

With the new documentation layout I would at least consider updating the
description for the normal functions with an example on how to formulate an
expression that works contra-normally, and in the case where there does
exist such a specialized function, naming it.

David J.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#4)
Re: split_part for the last element

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com>
wrote:

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:
split_part('foo bar baz', ' ', -1) -> 'baz'

I'm torn here because this would be the first usage of this concept in
PostgreSQL (I think).

We already have some JSON functions that act like that, not to mention
the left() and right() string functions, so I don't see that much of an
argument against extending split_part to do it.

Tangentially, I noticed that we have a "starts_with" function but no
corresponding "end_with".
It's been a while but there used to be a systemic inertia working
against adding minor useful functions such as these.

Part of the reason for that bias is that these one-off functions tend
not to be very well thought out or complete :-(. The point that
could be raised here is why we have split_part but not any corresponding
regex-based splitter.

regards, tom lane

#6Nikhil Benesch
nikhil.benesch@gmail.com
In reply to: David G. Johnston (#4)
Re: split_part for the last element

On Fri, Oct 23, 2020 at 2:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

I'm torn here because this would be the first usage of this concept in
PostgreSQL (I think).

Yeah, I also have some qualms about this design in the context of Postgres.
Particularly because Postgres allows arrays to begin at negative indices.

Tangentially, I noticed that we have a "starts_with" function but no
corresponding "end_with".

Ah, interesting. On the other hand, there are both "left" and "right",
"lpad" and "rpad", and "ltrim" and "rtrim". And at least ends_with has the
fairly elegant alternative of "s LIKE '%suffix'".

It's been a while but there used to be a systemic inertia working against
adding minor useful functions such as these.

With the new documentation layout I would at least consider updating the
description for the normal functions with an example on how to formulate
an expression that works contra-normally, and in the case where there does
exist such a specialized function, naming it.

Supposing you go this route, which of the options would you envision
mentioning as the converse of split_part?

#7Nikhil Benesch
nikhil.benesch@gmail.com
In reply to: Tom Lane (#5)
Re: split_part for the last element

On Fri, Oct 23, 2020 at 2:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com>
wrote:

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:
split_part('foo bar baz', ' ', -1) -> 'baz'

I'm torn here because this would be the first usage of this concept in
PostgreSQL (I think).

We already have some JSON functions that act like that, not to mention
the left() and right() string functions, so I don't see that much of an
argument against extending split_part to do it.

Oh, I didn't realize left and right already worked this way. That
makes this design much more compelling, at least to me.

If the consensus is this extension is ok, I'd be happy to prepare a patch.