TRIM_ARRAY

Started by Vik Fearingabout 5 years ago12 messageshackers
Jump to latest
#1Vik Fearing
vik@postgresfriends.org

The SQL standard defines a function called TRIM_ARRAY that surprisingly
has syntax that looks like a function! So I implemented it using a thin
wrapper around our array slice syntax. It is literally just ($1)[1:$2].

An interesting case that I decided to handle by explaining it in the
docs is that this won't give you the first n elements if your lower
bound is not 1. My justification for this is 1) non-standard lower
bounds are so rare in the wild that 2) people using them can just not
use this function. The alternative is to go through the unnest dance
(or write it in C) which defeats inlining.

Patch attached.
--
Vik Fearing

Attachments:

0001-implement-trim_array.patchtext/x-patch; charset=UTF-8; name=0001-implement-trim_array.patchDownload+58-1
#2Isaac Morland
isaac.morland@gmail.com
In reply to: Vik Fearing (#1)
Re: TRIM_ARRAY

On Tue, 16 Feb 2021 at 12:54, Vik Fearing <vik@postgresfriends.org> wrote:

The SQL standard defines a function called TRIM_ARRAY that surprisingly
has syntax that looks like a function! So I implemented it using a thin
wrapper around our array slice syntax. It is literally just ($1)[1:$2].

An interesting case that I decided to handle by explaining it in the
docs is that this won't give you the first n elements if your lower
bound is not 1. My justification for this is 1) non-standard lower
bounds are so rare in the wild that 2) people using them can just not
use this function. The alternative is to go through the unnest dance
(or write it in C) which defeats inlining.

I don't recall ever seeing non-default lower bounds, so I actually think
it's OK to just rule out that scenario, but why not something like this:

($1)[:array_lower ($1, 1) + $2 - 1]

Note that I've used the 9.6 feature that allows omitting the lower bound.

#3Vik Fearing
vik@postgresfriends.org
In reply to: Isaac Morland (#2)
Re: TRIM_ARRAY

On 2/16/21 7:32 PM, Isaac Morland wrote:

On Tue, 16 Feb 2021 at 12:54, Vik Fearing <vik@postgresfriends.org> wrote:

The SQL standard defines a function called TRIM_ARRAY that surprisingly
has syntax that looks like a function! So I implemented it using a thin
wrapper around our array slice syntax. It is literally just ($1)[1:$2].

An interesting case that I decided to handle by explaining it in the
docs is that this won't give you the first n elements if your lower
bound is not 1. My justification for this is 1) non-standard lower
bounds are so rare in the wild that 2) people using them can just not
use this function. The alternative is to go through the unnest dance
(or write it in C) which defeats inlining.

I don't recall ever seeing non-default lower bounds, so I actually think
it's OK to just rule out that scenario, but why not something like this:

($1)[:array_lower ($1, 1) + $2 - 1]

I'm kind of embarrassed that I didn't think about doing that; it is a
much better solution. You lose the non-standard bounds but I don't
think there is any way besides C to keep the lower bound regardless of
how you trim it.

V2 attached.
--
Vik Fearing

Attachments:

0001-implement-trim_array.v2.patchtext/x-patch; charset=UTF-8; name=0001-implement-trim_array.v2.patchDownload+56-1
#4Vik Fearing
vik@postgresfriends.org
In reply to: Vik Fearing (#3)
Re: TRIM_ARRAY

On 2/16/21 11:38 PM, Vik Fearing wrote:

On 2/16/21 7:32 PM, Isaac Morland wrote:

On Tue, 16 Feb 2021 at 12:54, Vik Fearing <vik@postgresfriends.org> wrote:

The SQL standard defines a function called TRIM_ARRAY that surprisingly
has syntax that looks like a function! So I implemented it using a thin
wrapper around our array slice syntax. It is literally just ($1)[1:$2].

An interesting case that I decided to handle by explaining it in the
docs is that this won't give you the first n elements if your lower
bound is not 1. My justification for this is 1) non-standard lower
bounds are so rare in the wild that 2) people using them can just not
use this function. The alternative is to go through the unnest dance
(or write it in C) which defeats inlining.

I don't recall ever seeing non-default lower bounds, so I actually think
it's OK to just rule out that scenario, but why not something like this:

($1)[:array_lower ($1, 1) + $2 - 1]

I'm kind of embarrassed that I didn't think about doing that; it is a
much better solution. You lose the non-standard bounds but I don't
think there is any way besides C to keep the lower bound regardless of
how you trim it.

I've made a bit of a mess out of this, but I partly blame the standard
which is very unclear. It actually describes trimming the right n
elements instead of the left n like I've done here. I'll be back later
with a better patch that does what it's actually supposed to.
--
Vik Fearing

#5Vik Fearing
vik@postgresfriends.org
In reply to: Vik Fearing (#4)
Re: TRIM_ARRAY

On 2/17/21 1:25 AM, Vik Fearing wrote:

I've made a bit of a mess out of this, but I partly blame the standard
which is very unclear. It actually describes trimming the right n
elements instead of the left n like I've done here. I'll be back later
with a better patch that does what it's actually supposed to.

And here is that patch.

Since the only justification I have for such a silly function is that
it's part of the standard, I decided to also issue the errors that the
standard describes which means the new function is now in C.
--
Vik Fearing

Attachments:

0001-implement-trim_array.v3.patchtext/x-patch; charset=UTF-8; name=0001-implement-trim_array.v3.patchDownload+108-2
#6Dian M Fay
dian.m.fay@gmail.com
In reply to: Vik Fearing (#5)
Re: TRIM_ARRAY

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, failed
Documentation: tested, failed

This basically does what it says, and the code looks good. The
documentation is out of alphabetical order (trim_array should appear
under cardinality rather than over)) but good otherwise. I was able to
"break" the function with an untyped null in psql:

select trim_array(null, 2);
ERROR: could not determine polymorphic type because input has type unknown

I don't know whether there are any circumstances other than manual entry
in psql where this could happen, since column values and variables will
always be typed. I don't have access to the standard, but DB2's docs[1]https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_trimarray.html
note "if any argument is null, the result is the null value", so an
up-front null check might be preferable to a slightly arcane user-facing
error, even if it's a silly invocation of a silly function :)

[1]: https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_trimarray.html

The new status of this patch is: Waiting on Author

#7Vik Fearing
vik@postgresfriends.org
In reply to: Dian M Fay (#6)
Re: TRIM_ARRAY

On 3/2/21 12:14 AM, Dian Fay wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, failed
Documentation: tested, failed

Thank you for looking at my patch!

This basically does what it says, and the code looks good. The
documentation is out of alphabetical order (trim_array should appear
under cardinality rather than over)) but good otherwise.

Hmm. It appears between cardinality and unnest in the source code and
also my compiled html. Can you say more about where you're seeing the
wrong order?

I was able to
"break" the function with an untyped null in psql:

select trim_array(null, 2);
ERROR: could not determine polymorphic type because input has type unknown

I don't know whether there are any circumstances other than manual entry
in psql where this could happen, since column values and variables will
always be typed. I don't have access to the standard, but DB2's docs[1]
note "if any argument is null, the result is the null value", so an
up-front null check might be preferable to a slightly arcane user-facing
error, even if it's a silly invocation of a silly function :)

[1] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_trimarray.html

The standard also says that if either argument is null, the result is
null. The problem here is that postgres needs to know what the return
type is and it can only determine that from the input.

If you give the function a typed null, it returns null as expected.

The new status of this patch is: Waiting on Author

I put it back to Needs Review without a new patch because I don't know
what I would change.
--
Vik Fearing

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dian M Fay (#6)
Re: TRIM_ARRAY

Dian Fay <dian.m.fay@gmail.com> writes:

This basically does what it says, and the code looks good. The
documentation is out of alphabetical order (trim_array should appear
under cardinality rather than over)) but good otherwise. I was able to
"break" the function with an untyped null in psql:

select trim_array(null, 2);
ERROR: could not determine polymorphic type because input has type unknown

That's a generic parser behavior for polymorphic functions, not something
this particular function could or should dodge.

regards, tom lane

#9Dian M Fay
dian.m.fay@gmail.com
In reply to: Vik Fearing (#7)
Re: TRIM_ARRAY

On Mon Mar 1, 2021 at 6:53 PM EST, Vik Fearing wrote:

This basically does what it says, and the code looks good. The
documentation is out of alphabetical order (trim_array should appear
under cardinality rather than over)) but good otherwise.

Hmm. It appears between cardinality and unnest in the source code and
also my compiled html. Can you say more about where you're seeing the
wrong order?

I applied the patch to the latest commit, ffd3944ab9. Table 9.52 is
ordered:

array_to_string
array_upper
trim_array
cardinality
unnest

The problem here is that postgres needs to know what the return
type is and it can only determine that from the input.

If you give the function a typed null, it returns null as expected.

The new status of this patch is: Waiting on Author

I put it back to Needs Review without a new patch because I don't know
what I would change.

I'd thought that checking v and returning null instead of raising the
error would be more friendly, should it be possible to pass an untyped
null accidentally instead of on purpose, and I couldn't rule that out.
I've got no objections other than the docs having been displaced.

#10Vik Fearing
vik@postgresfriends.org
In reply to: Dian M Fay (#9)
Re: TRIM_ARRAY

On 3/2/21 1:02 AM, Dian M Fay wrote:

On Mon Mar 1, 2021 at 6:53 PM EST, Vik Fearing wrote:

This basically does what it says, and the code looks good. The
documentation is out of alphabetical order (trim_array should appear
under cardinality rather than over)) but good otherwise.

Hmm. It appears between cardinality and unnest in the source code and
also my compiled html. Can you say more about where you're seeing the
wrong order?

I applied the patch to the latest commit, ffd3944ab9. Table 9.52 is
ordered:

array_to_string
array_upper
trim_array
cardinality
unnest

So it turns out I must have fixed it locally after I posted the patch
and then forgot I did that. Attached is a new patch with the order
correct. Thanks for spotting it!

The problem here is that postgres needs to know what the return
type is and it can only determine that from the input.

If you give the function a typed null, it returns null as expected.

The new status of this patch is: Waiting on Author

I put it back to Needs Review without a new patch because I don't know
what I would change.

I'd thought that checking v and returning null instead of raising the
error would be more friendly, should it be possible to pass an untyped
null accidentally instead of on purpose, and I couldn't rule that out.

As Tom said, that is something that does not belong in this patch.
--
Vik Fearing

Attachments:

0001-implement-trim_array.v4.patchtext/x-patch; charset=UTF-8; name=0001-implement-trim_array.v4.patchDownload+108-2
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#10)
Re: TRIM_ARRAY

Vik Fearing <vik@postgresfriends.org> writes:

On 3/2/21 1:02 AM, Dian M Fay wrote:

I'd thought that checking v and returning null instead of raising the
error would be more friendly, should it be possible to pass an untyped
null accidentally instead of on purpose, and I couldn't rule that out.

As Tom said, that is something that does not belong in this patch.

Yeah, the individual function really doesn't have any way to affect
this, since the error happens on the way to identifying which function
we should call in the first place.

I had the same problem as Dian of the func.sgml hunk winding up in
the wrong place. I think this is practically inevitable unless the
submitter uses more than 3 lines of context for the diff, because
otherwise the context is just boilerplate that looks the same
everywhere in the function tables. Unless the diff is 100% up to date
so that the line numbers are exactly right, patch is likely to guess
wrong about where to insert the new hunk. We'll just have to be
vigilant about that.

I fooled with your test case a bit ... I didn't think it was really
necessary to create and drop a table, when we could just use a VALUES
clause as source of test data. Also you'd forgotten to update the
"descr" description of the function to match the final understanding
of the semantics.

Looks good otherwise, so pushed.

regards, tom lane

#12Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#11)
Re: TRIM_ARRAY

On 3/3/21 10:47 PM, Tom Lane wrote:

I had the same problem as Dian of the func.sgml hunk winding up in
the wrong place. I think this is practically inevitable unless the
submitter uses more than 3 lines of context for the diff, because
otherwise the context is just boilerplate that looks the same
everywhere in the function tables. Unless the diff is 100% up to date
so that the line numbers are exactly right, patch is likely to guess
wrong about where to insert the new hunk. We'll just have to be
vigilant about that.

Noted.

I fooled with your test case a bit ... I didn't think it was really
necessary to create and drop a table, when we could just use a VALUES
clause as source of test data. Also you'd forgotten to update the
"descr" description of the function to match the final understanding
of the semantics.

Thank you.

Looks good otherwise, so pushed.

Thanks!
--
Vik Fearing