Compatible defaults for LEAD/LAG

Started by Vik Fearingalmost 6 years ago18 messageshackers

vik@postgresfriends.org

almost 6 years ago

I noticed that the PostgreSQL entry in a pan-database feature matrix by
Modern SQL was not reflecting the reality of our features.[1]https://twitter.com/pg_xocolatl/status/1266694496194093057 -- Vik Fearing

It turns out that test case used by the author produced an error which
the tool took to mean the feature was not implemented. I don't have the
actual test, but here is a simulation of it:

postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
postgres-# ORDER BY n;

ERROR: function lag(numeric, integer, integer) does not exist
LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
^
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.

Attached is a patch that fixes this issue using the new anycompatible
pseudotype. I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

I looked for other functions with a similar issue but didn't find any.

[1]: https://twitter.com/pg_xocolatl/status/1266694496194093057 -- Vik Fearing
--
Vik Fearing

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Vik Fearing (#1)

Re: Compatible defaults for LEAD/LAG

Vik Fearing <vik@postgresfriends.org> writes:

postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
postgres-# ORDER BY n;
ERROR: function lag(numeric, integer, integer) does not exist
LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
^

Yeah, we have similar issues elsewhere.

Attached is a patch that fixes this issue using the new anycompatible
pseudotype. I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

The main thing that makes this perhaps-not-trivial is that an
anycompatible-ified function will match more cases than it would have
before, possibly causing conflicts if the function or operator name
is overloaded. We'd have to look at such cases and decide what we
want to do --- one answer would be to drop some of the alternatives
and rely on the parser to add casts, but that might slow things down.

Anyway, I agree that this is a good direction to pursue, but not in
a last-minute-hack-for-v13 way.

regards, tom lane

Vik Fearing

vik@postgresfriends.org

almost 6 years ago

In reply to: Tom Lane (#2)

Re: Compatible defaults for LEAD/LAG

On 5/31/20 9:53 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
postgres-# ORDER BY n;
ERROR: function lag(numeric, integer, integer) does not exist
LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
^

Yeah, we have similar issues elsewhere.

Attached is a patch that fixes this issue using the new anycompatible
pseudotype. I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

The main thing that makes this perhaps-not-trivial is that an
anycompatible-ified function will match more cases than it would have
before, possibly causing conflicts if the function or operator name
is overloaded. We'd have to look at such cases and decide what we
want to do --- one answer would be to drop some of the alternatives
and rely on the parser to add casts, but that might slow things down.

Anyway, I agree that this is a good direction to pursue, but not in
a last-minute-hack-for-v13 way.

Fair enough. I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/
--
Vik Fearing

Vik Fearing

vik@postgresfriends.org

almost 6 years ago

In reply to: Tom Lane (#2)

Re: Compatible defaults for LEAD/LAG

On 5/31/20 9:53 PM, Tom Lane wrote:

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

BTW, I did go through pg_proc.dat to see what we've got and these were
the only two I found. I mentioned that in a part you didn't quote. Now
I went through again, this time using a query on pg_proc itself, and I
missed array_replace during my manual scan.

array_replace, lead, and lag are the only functions we have that take
more than one anyelement.

There are many functions (seemingly all operator implementations) that
take multiple anyarray, anyrange, and anyenum; but none with anynonarray
and only those three for anyelement. Are you sure we don't want to give
at least the anycompatible type a nice public workout with this?
--
Vik Fearing

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Vik Fearing (#4)

Re: Compatible defaults for LEAD/LAG

Vik Fearing <vik@postgresfriends.org> writes:

On 5/31/20 9:53 PM, Tom Lane wrote:

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

array_replace, lead, and lag are the only functions we have that take
more than one anyelement.

That's just the tip of the iceberg, though. If you consider all the
old-style polymorphic types, we have

proc
-----------------------------------------------
array_append(anyarray,anyelement)
array_cat(anyarray,anyarray)
array_eq(anyarray,anyarray)
array_ge(anyarray,anyarray)
array_gt(anyarray,anyarray)
array_larger(anyarray,anyarray)
array_le(anyarray,anyarray)
array_lt(anyarray,anyarray)
array_ne(anyarray,anyarray)
array_position(anyarray,anyelement)
array_position(anyarray,anyelement,integer)
array_positions(anyarray,anyelement)
array_prepend(anyelement,anyarray)
array_remove(anyarray,anyelement)
array_replace(anyarray,anyelement,anyelement)
array_smaller(anyarray,anyarray)
arraycontained(anyarray,anyarray)
arraycontains(anyarray,anyarray)
arrayoverlap(anyarray,anyarray)
btarraycmp(anyarray,anyarray)
elem_contained_by_range(anyelement,anyrange)
lag(anyelement,integer,anyelement)
lead(anyelement,integer,anyelement)
range_adjacent(anyrange,anyrange)
range_after(anyrange,anyrange)
range_before(anyrange,anyrange)
range_cmp(anyrange,anyrange)
range_contained_by(anyrange,anyrange)
range_contains(anyrange,anyrange)
range_contains_elem(anyrange,anyelement)
range_eq(anyrange,anyrange)
range_ge(anyrange,anyrange)
range_gist_same(anyrange,anyrange,internal)
range_gt(anyrange,anyrange)
range_intersect(anyrange,anyrange)
range_le(anyrange,anyrange)
range_lt(anyrange,anyrange)
range_merge(anyrange,anyrange)
range_minus(anyrange,anyrange)
range_ne(anyrange,anyrange)
range_overlaps(anyrange,anyrange)
range_overleft(anyrange,anyrange)
range_overright(anyrange,anyrange)
range_union(anyrange,anyrange)
width_bucket(anyelement,anyarray)
(45 rows)

(I ignored anyenum here, since it has no mapping to the anycompatible
family.) Some of these are internal or can otherwise be discounted,
but surely there'd be a market for, say, "int8[] || int4". The big
question that raises is can we do it without creating impossible confusion
with text-style concatenation.

Are you sure we don't want to give
at least the anycompatible type a nice public workout with this?

Yes, I'm quite sure. If the idea crashes and burns for some reason,
we'll be glad we didn't buy into it full-speed-ahead right away.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Tom Lane (#5)

Re: Compatible defaults for LEAD/LAG

po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Vik Fearing <vik@postgresfriends.org> writes:

On 5/31/20 9:53 PM, Tom Lane wrote:

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

array_replace, lead, and lag are the only functions we have that take
more than one anyelement.

That's just the tip of the iceberg, though. If you consider all the
old-style polymorphic types, we have

proc
-----------------------------------------------
array_append(anyarray,anyelement)
array_cat(anyarray,anyarray)
array_eq(anyarray,anyarray)
array_ge(anyarray,anyarray)
array_gt(anyarray,anyarray)
array_larger(anyarray,anyarray)
array_le(anyarray,anyarray)
array_lt(anyarray,anyarray)
array_ne(anyarray,anyarray)
array_position(anyarray,anyelement)
array_position(anyarray,anyelement,integer)
array_positions(anyarray,anyelement)
array_prepend(anyelement,anyarray)
array_remove(anyarray,anyelement)
array_replace(anyarray,anyelement,anyelement)
array_smaller(anyarray,anyarray)
arraycontained(anyarray,anyarray)
arraycontains(anyarray,anyarray)
arrayoverlap(anyarray,anyarray)
btarraycmp(anyarray,anyarray)

I am not sure, if using anycompatible for buildin's array functions can be
good idea. Theoretically a users can do new performance errors due hidden
cast of a longer array.

For example array_positions(int[], numeric). In this case conversion int[]
to numeric[] can be bad idea in some cases. Reversely in this case we want
to convert numeric to int. When it is not possible without loss, then we
can return false, or maybe raise exception. I designed anycompatible* for
usage when the parameters has "symmetric weight", and cast to most common
type should not to have performance issue. It is not this case. When I
though about this cases, and about designing functions compatible with
Oracle I though about another generic family (families) with different
behave (specified by suffix or maybe with typemod or with some syntax):

a) force_cast .. it can be good for array's functions -
array_position(anyarray, anyelement_force_cast), array_position(anyarray,
anyelement(force_cast)) .. this is often behave in Oracle
b) force_safe_cast .. special kind of casting - safer variant of tolerant
Oracle's casting - 1.0::int is valid, 1.1::int is not valid in this type of
casting

anycompatible* family can helps with some cases, but it is not silver
bullet for all unfriendly, or not compatible situation (mainly for buildin
functionality).

Regards

Pavel

Show quoted text

elem_contained_by_range(anyelement,anyrange)
lag(anyelement,integer,anyelement)
lead(anyelement,integer,anyelement)
range_adjacent(anyrange,anyrange)
range_after(anyrange,anyrange)
range_before(anyrange,anyrange)
range_cmp(anyrange,anyrange)
range_contained_by(anyrange,anyrange)
range_contains(anyrange,anyrange)
range_contains_elem(anyrange,anyelement)
range_eq(anyrange,anyrange)
range_ge(anyrange,anyrange)
range_gist_same(anyrange,anyrange,internal)
range_gt(anyrange,anyrange)
range_intersect(anyrange,anyrange)
range_le(anyrange,anyrange)
range_lt(anyrange,anyrange)
range_merge(anyrange,anyrange)
range_minus(anyrange,anyrange)
range_ne(anyrange,anyrange)
range_overlaps(anyrange,anyrange)
range_overleft(anyrange,anyrange)
range_overright(anyrange,anyrange)
range_union(anyrange,anyrange)
width_bucket(anyelement,anyarray)
(45 rows)

(I ignored anyenum here, since it has no mapping to the anycompatible
family.) Some of these are internal or can otherwise be discounted,
but surely there'd be a market for, say, "int8[] || int4". The big
question that raises is can we do it without creating impossible confusion
with text-style concatenation.

Are you sure we don't want to give
at least the anycompatible type a nice public workout with this?

Yes, I'm quite sure. If the idea crashes and burns for some reason,
we'll be glad we didn't buy into it full-speed-ahead right away.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Pavel Stehule (#6)

Re: Compatible defaults for LEAD/LAG

Pavel Stehule <pavel.stehule@gmail.com> writes:

po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

That's just the tip of the iceberg, though. If you consider all the
old-style polymorphic types, we have [for example]
array_append(anyarray,anyelement)

I am not sure, if using anycompatible for buildin's array functions can be
good idea. Theoretically a users can do new performance errors due hidden
cast of a longer array.

I don't buy that argument. If the query requires casting int4[] to
int8[], making the user do it by hand isn't going to improve performance
over having the parser insert the coercion automatically. Sure, there
will be some fraction of queries that could be rewritten to avoid the
need for any cast, but so what? Often the performance difference isn't
going to matter; and when it does, I don't see that this is any different
from hundreds of other cases in which there are more-efficient and
less-efficient ways to write a query. SQL is full of performance traps
and always will be. You're also assuming that when the user gets an
"operator does not exist" error from "int4[] || int8", that will magically
lead them to choosing an optimal substitute. I know of no reason to
believe that --- it's at least as likely that they'll conclude it just
can't be done, as in the LAG() example we started the thread with. So
I think most people would be much happier if the system just did something
reasonable, and they can optimize later if it's important.

When I
though about this cases, and about designing functions compatible with
Oracle I though about another generic family (families) with different
behave (specified by suffix or maybe with typemod or with some syntax):

So we're already deciding anycompatible can't get the job done? Maybe
it's a good thing we had this conversation now. It's not too late to
rip the feature out of v13 altogether, and try again later. But if
you think I'm going to commit yet another variant of polymorphism on
top of this one, you're mistaken.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Tom Lane (#7)

Re: Compatible defaults for LEAD/LAG

po 1. 6. 2020 v 17:36 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

That's just the tip of the iceberg, though. If you consider all the
old-style polymorphic types, we have [for example]
array_append(anyarray,anyelement)

I am not sure, if using anycompatible for buildin's array functions can

be

good idea. Theoretically a users can do new performance errors due hidden
cast of a longer array.

I don't buy that argument. If the query requires casting int4[] to
int8[], making the user do it by hand isn't going to improve performance
over having the parser insert the coercion automatically. Sure, there
will be some fraction of queries that could be rewritten to avoid the
need for any cast, but so what? Often the performance difference isn't
going to matter; and when it does, I don't see that this is any different
from hundreds of other cases in which there are more-efficient and
less-efficient ways to write a query. SQL is full of performance traps
and always will be. You're also assuming that when the user gets an
"operator does not exist" error from "int4[] || int8", that will magically
lead them to choosing an optimal substitute. I know of no reason to
believe that --- it's at least as likely that they'll conclude it just
can't be done, as in the LAG() example we started the thread with. So
I think most people would be much happier if the system just did something
reasonable, and they can optimize later if it's important.

When I
though about this cases, and about designing functions compatible with
Oracle I though about another generic family (families) with different
behave (specified by suffix or maybe with typemod or with some syntax):

So we're already deciding anycompatible can't get the job done? Maybe
it's a good thing we had this conversation now. It's not too late to
rip the feature out of v13 altogether, and try again later. But if
you think I'm going to commit yet another variant of polymorphism on
top of this one, you're mistaken.

anycompatible types are fully conssistent with Postgres buildin functions
supported by parser. It is main benefit and important benefit.
Without anycompatible types is pretty hard to write variadic functions with
friendly behave like buildin functions has.
It can be perfect for LAG() example. It does almost same work what we did
manually in parser.

Surely, it is not compatible with Oracle's polymorphism rules, because

a) Our and Postgres type system is different (sometimes very different).

b) Oracle's casting rules depends on argument positions and some specific
exceptions - so it is not possible to map it on Postgres type system (or
system of polymorphic types).

I wrote and I spent lot of time on this feature to be used - by core
developers, by extension's developers. Like lot of other feature - it can
carry more comfort with some risk of performance issues.

On second hand if we use this feature for buildin functions, there is zero
impact of current applications - there should not be any problem with
compatibility or performance.

Maybe I am too old, and last year was too terrible so I have a problem to
imagine a "intelligent" user now :)

Regards

Pavel

Although I can imagine other enhancing of polymorphic types, I don't
propose any new, and I don't expect any enhancing in near feature. Is true,
so there are not requests and I think so "anycompatible" and "any" are more
than good enough for 99.99% developers.

I am little bit surprised so semi compatibility mode implemented in Orafce
is good enough and full compatibility with Oracle a) isn't possible, b)
isn't requested by visible group of users (or users who need it are
satisfied by EDB).

Show quoted text

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Vik Fearing (#3)

Re: Compatible defaults for LEAD/LAG

ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 5/31/20 9:53 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
postgres-# ORDER BY n;
ERROR: function lag(numeric, integer, integer) does not exist
LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
^

Yeah, we have similar issues elsewhere.

Attached is a patch that fixes this issue using the new anycompatible
pseudotype. I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

The main thing that makes this perhaps-not-trivial is that an
anycompatible-ified function will match more cases than it would have
before, possibly causing conflicts if the function or operator name
is overloaded. We'd have to look at such cases and decide what we
want to do --- one answer would be to drop some of the alternatives
and rely on the parser to add casts, but that might slow things down.

Anyway, I agree that this is a good direction to pursue, but not in
a last-minute-hack-for-v13 way.

Fair enough. I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/
--
Vik Fearing

The patch is ok. It is almost trivial. It solves one issue, but maybe it
introduces a new issue.

Other databases try to coerce default constant expression to value
expression. I found a description about this behaviour for MSSQL, Sybase,
BigQuery.

I didn't find related documentation for Oracle, and I have not a access to
some running instance of Oracle to test it.

Ansi SQL say - type of default expression should be compatible with lag
expression, and declared type of result should be type of value expression.

IWD 9075-2:201?(E) 6.10 <window function> - j) v)

Current implementation is limited (and the behaviour is not user friendly
in all details), but new behaviour (implemented by patch) is in half cases
out of standard :(.

These cases are probably not often - and they are really corner cases, but
I cannot to qualify how much important this fact is.

For users, the implemented feature is better, and it is safe.
Implementation is trivial - is significantly simpler than implementation
that is 100% standard, although it should not be a hard problem too (in
analyze stage it probably needs a few lines too).

There has to be a decision, because now we can go in both directions. After
choosing there will not be a way back.

Regards

Pavel

#10

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Pavel Stehule (#9)

Re: Compatible defaults for LEAD/LAG

On 13 Jul 2020, at 19:23, Pavel Stehule <pavel.stehule@gmail.com> wrote:

ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> napsal:
On 5/31/20 9:53 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> writes:

postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
postgres-# ORDER BY n;
ERROR: function lag(numeric, integer, integer) does not exist
LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
^

Yeah, we have similar issues elsewhere.

Attached is a patch that fixes this issue using the new anycompatible
pseudotype. I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

The main thing that makes this perhaps-not-trivial is that an
anycompatible-ified function will match more cases than it would have
before, possibly causing conflicts if the function or operator name
is overloaded. We'd have to look at such cases and decide what we
want to do --- one answer would be to drop some of the alternatives
and rely on the parser to add casts, but that might slow things down.

Anyway, I agree that this is a good direction to pursue, but not in
a last-minute-hack-for-v13 way.

Fair enough. I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/ <https://commitfest.postgresql.org/28/2574/>
--
Vik Fearing

The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue.

Other databases try to coerce default constant expression to value expression. I found a description about this behaviour for MSSQL, Sybase, BigQuery.

I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test it.

Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result should be type of value expression.

IWD 9075-2:201?(E) 6.10 <window function> - j) v)

Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour (implemented by patch) is in half cases out of standard :(.

These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important this fact is.

For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler than implementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably needs a few lines too).

There has to be a decision, because now we can go in both directions. After choosing there will not be a way back.

This patch is marked waiting for author, but it's not clear from this review
what we're waiting on. Should this be moved to Needs review or Ready for
Committer instead to solicit the input from a committer? ISTM the latter is
more suitable. Or did I misunderstand your review?

cheers ./daniel

#11

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#10)

Re: Compatible defaults for LEAD/LAG

čt 23. 7. 2020 v 13:29 odesílatel Daniel Gustafsson <daniel@yesql.se>
napsal:

On 13 Jul 2020, at 19:23, Pavel Stehule <pavel.stehule@gmail.com> wrote:

ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org

<mailto:vik@postgresfriends.org>> napsal:

On 5/31/20 9:53 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>>

writes:

postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
postgres-# ORDER BY n;
ERROR: function lag(numeric, integer, integer) does not exist
LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
^

Yeah, we have similar issues elsewhere.

Attached is a patch that fixes this issue using the new anycompatible
pseudotype. I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail. I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

The main thing that makes this perhaps-not-trivial is that an
anycompatible-ified function will match more cases than it would have
before, possibly causing conflicts if the function or operator name
is overloaded. We'd have to look at such cases and decide what we
want to do --- one answer would be to drop some of the alternatives
and rely on the parser to add casts, but that might slow things down.

Anyway, I agree that this is a good direction to pursue, but not in
a last-minute-hack-for-v13 way.

Fair enough. I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/ <

https://commitfest.postgresql.org/28/2574/>

--
Vik Fearing

The patch is ok. It is almost trivial. It solves one issue, but maybe it

introduces a new issue.

Other databases try to coerce default constant expression to value

expression. I found a description about this behaviour for MSSQL, Sybase,
BigQuery.

I didn't find related documentation for Oracle, and I have not a access

to some running instance of Oracle to test it.

Ansi SQL say - type of default expression should be compatible with lag

expression, and declared type of result should be type of value expression.

IWD 9075-2:201?(E) 6.10 <window function> - j) v)

Current implementation is limited (and the behaviour is not user

friendly in all details), but new behaviour (implemented by patch) is in
half cases out of standard :(.

These cases are probably not often - and they are really corner cases,

but I cannot to qualify how much important this fact is.

For users, the implemented feature is better, and it is safe.

Implementation is trivial - is significantly simpler than implementation
that is 100% standard, although it should not be a hard problem too (in
analyze stage it probably needs a few lines too).

There has to be a decision, because now we can go in both directions.

After choosing there will not be a way back.

This patch is marked waiting for author, but it's not clear from this
review
what we're waiting on. Should this be moved to Needs review or Ready for
Committer instead to solicit the input from a committer? ISTM the latter
is
more suitable. Or did I misunderstand your review?

Unfortunately, I don't know - I am not sure about committing this patch,
and I am not able to qualify for impact.

This is nice example of usage of anycompatible type (that is consistent
with other things in Postgres), but standard says something else.

It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
but Tom doesn't like this patch.

I am more inclined to think so this feature should be implemented
differently - there is no strong reason to go against standard or against
the implementations of other databases (and increase the costs of porting).
Now the implementation is limited, but allowed behaviour is 100% ANSI.

On second hand, the implemented feature (behaviour) is pretty small, and
really corner case, so maybe a simple solution (although it isn't perfect)
is good.

So I would listen to the opinions of other people.

Regards

Pavel

Show quoted text

cheers ./daniel

#12

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Pavel Stehule (#11)

Re: Compatible defaults for LEAD/LAG

Pavel Stehule <pavel.stehule@gmail.com> writes:

This is nice example of usage of anycompatible type (that is consistent
with other things in Postgres), but standard says something else.
It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
but Tom doesn't like this patch.
I am more inclined to think so this feature should be implemented
differently - there is no strong reason to go against standard or against
the implementations of other databases (and increase the costs of porting).
Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument. The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.

So this deviation from spec doesn't bother me; we have much bigger ones.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion. I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so. But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list. As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity. Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

regards, tom lane

#13

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#12)

Re: Compatible defaults for LEAD/LAG

ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

This is nice example of usage of anycompatible type (that is consistent
with other things in Postgres), but standard says something else.
It can be easily solved with https://commitfest.postgresql.org/28/2081/

-

but Tom doesn't like this patch.
I am more inclined to think so this feature should be implemented
differently - there is no strong reason to go against standard or against
the implementations of other databases (and increase the costs of

porting).

Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument. The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it
can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too
stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
...

I am afraid about performance - if default value can change type, then some
other things can stop work - like using index.

For *this* case we don't speak about some operations between two
independent operands or function arguments. We are speaking about
the value and about a *default* for the value.

So this deviation from spec doesn't bother me; we have much bigger ones.

ok, if it is acceptable for other people, I can accept it too - I
understand well so it is a corner case and there is some consistency with
other Postgres features.

Maybe this difference should be mentioned in documentation.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion. I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

oid | prorettype
-----------------------------------------------+------------
lag(anyelement,integer,anyelement) | anyelement
lead(anyelement,integer,anyelement) | anyelement
width_bucket(anyelement,anyarray) | integer
btarraycmp(anyarray,anyarray) | integer
array_eq(anyarray,anyarray) | boolean
array_ne(anyarray,anyarray) | boolean
array_lt(anyarray,anyarray) | boolean
array_gt(anyarray,anyarray) | boolean
array_le(anyarray,anyarray) | boolean
array_ge(anyarray,anyarray) | boolean
array_append(anyarray,anyelement) | anyarray
array_prepend(anyelement,anyarray) | anyarray
array_cat(anyarray,anyarray) | anyarray
array_larger(anyarray,anyarray) | anyarray
array_smaller(anyarray,anyarray) | anyarray
array_position(anyarray,anyelement) | integer
array_position(anyarray,anyelement,integer) | integer
array_positions(anyarray,anyelement) | integer[]
array_remove(anyarray,anyelement) | anyarray
array_replace(anyarray,anyelement,anyelement) | anyarray
arrayoverlap(anyarray,anyarray) | boolean
arraycontains(anyarray,anyarray) | boolean
arraycontained(anyarray,anyarray) | boolean
elem_contained_by_range(anyelement,anyrange) | boolean
range_contains_elem(anyrange,anyelement) | boolean
range_eq(anyrange,anyrange) | boolean
range_ne(anyrange,anyrange) | boolean
range_overlaps(anyrange,anyrange) | boolean
range_contains(anyrange,anyrange) | boolean
range_contained_by(anyrange,anyrange) | boolean
range_adjacent(anyrange,anyrange) | boolean
range_before(anyrange,anyrange) | boolean
range_after(anyrange,anyrange) | boolean
range_overleft(anyrange,anyrange) | boolean
range_overright(anyrange,anyrange) | boolean
range_union(anyrange,anyrange) | anyrange
range_merge(anyrange,anyrange) | anyrange
range_intersect(anyrange,anyrange) | anyrange
range_minus(anyrange,anyrange) | anyrange
range_cmp(anyrange,anyrange) | integer
range_lt(anyrange,anyrange) | boolean
range_le(anyrange,anyrange) | boolean
range_ge(anyrange,anyrange) | boolean
range_gt(anyrange,anyrange) | boolean
range_gist_same(anyrange,anyrange,internal) | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so. But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

There it has sense without any discussion. But it is a little bit different
than using the default value in the LAG function.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list. As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity. Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

Pavel

Show quoted text

regards, tom lane

#14

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#13)

Re: Compatible defaults for LEAD/LAG

po 31. 8. 2020 v 7:05 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

This is nice example of usage of anycompatible type (that is consistent
with other things in Postgres), but standard says something else.
It can be easily solved with https://commitfest.postgresql.org/28/2081/

-

but Tom doesn't like this patch.
I am more inclined to think so this feature should be implemented
differently - there is no strong reason to go against standard or

against

the implementations of other databases (and increase the costs of

porting).

Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument. The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it
can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too
stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
...

I am afraid about performance - if default value can change type, then
some other things can stop work - like using index.

For *this* case we don't speak about some operations between two
independent operands or function arguments. We are speaking about
the value and about a *default* for the value.

So this deviation from spec doesn't bother me; we have much bigger ones.

ok, if it is acceptable for other people, I can accept it too - I
understand well so it is a corner case and there is some consistency with
other Postgres features.

Maybe this difference should be mentioned in documentation.

I thought more about this problem, and I think so ANSI specification is
semantically fully correct - it is consistent with application of default
value elsewhere in SQL environment.

In this case the optional argument is not "any" expression. It is the
default value for some expression . In other cases we usually use forced
explicit cast.

Unfortunately we do not have good tools for generic implementation of this
situation. In other cases there the functions have special support in
parser for this case (example xmltable)

I see few possibilities how to finish and close this issue:

1. use anycompatible type and add note to documentation so expression of
optional argument can change a result type, and so this is Postgres
specific - other databases and ANSI SQL disallow this.
It is the most simple solution, and it is good enough for this case, that
is not extra important.

2. choose the correct type somewhere inside the parser - for these two
functions.

3. introduce and implement some generic solution for this case - it can be
implemented on C level via some function helper or on syntax level

CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
a%type) ...

"defval argname%type" means for caller's expression "CAST(defval AS
typeof(argname))"

@3 can be a very interesting and useful feature, but it needs an agreement
and harder work
@2 this is 100% correct solution without hard work (but I am not sure if
there can be an agreement on this implementation)
@1 it is good enough for this issue without harder work and probably there
we can find an agreement simply.

Regards

Pavel

#15

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Pavel Stehule (#14)

Re: Compatible defaults for LEAD/LAG

Pavel Stehule <pavel.stehule@gmail.com> writes:

I see few possibilities how to finish and close this issue:
1. use anycompatible type and add note to documentation so expression of
optional argument can change a result type, and so this is Postgres
specific - other databases and ANSI SQL disallow this.
It is the most simple solution, and it is good enough for this case, that
is not extra important.
2. choose the correct type somewhere inside the parser - for these two
functions.
3. introduce and implement some generic solution for this case - it can be
implemented on C level via some function helper or on syntax level
CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
a%type) ...
"defval argname%type" means for caller's expression "CAST(defval AS
typeof(argname))"

I continue to feel that the spec's definition of this is not so
obviously right that we should jump through hoops to duplicate it.
In fact, I don't even agree that we need a disclaimer in the docs
saying that it's not quite the same. Only the most nitpicky
language lawyers would ever even notice.

In hopes of moving this along a bit, I experimented with converting
the other functions I listed to use anycompatible. I soon found that
touching any functions/operators that are listed in operator classes
would open a can of worms far larger than I'd previously supposed.
To maintain consistency, we'd have to propagate the datatype changes
to *every* function/operator associated with those opfamilies --- many
of which only have one any-foo input and thus aren't on my original
list. (An example here is that extending btree array_ops will require
changing the max(anyarray) and min(anyarray) aggregates too.) We'd
then end up with a situation that would be rather confusing from a
user's standpoint, in that it'd be quite un-obvious why some array
functions use anyarray while other ones use anycompatiblearray.

So I backed off to just changing the functions/operators that have
no opclass connections, such as array_cat. Even that has some
downsides --- for example, in the attached patch, it's necessary
to change some polymorphism.sql examples that explicitly reference
array_cat(anyarray). I wonder whether this change would break a
significant number of user-defined aggregates or operators.

(Note that I think we'd have to resist the temptation to fix that
by letting CREATE AGGREGATE et al accept support functions that
take anyarray/anycompatiblearray (etc) interchangeably. A lot of
the security analysis that went into CVE-2020-14350 depended on
the assumption that utility commands only do exact lookups of
support functions. If we tried to be lax about this, we'd
re-introduce the possibility for hostile capture of function
references in extension scripts.)

Another interesting issue, not seen in the attached but which
came up while I was experimenting with the more aggressive patch,
was this failure in the polymorphism test:

 select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
-ERROR:  cannot compare arrays of different element types
+ERROR:  function max(anyarray) does not exist

That's because we don't accept pg_stats.histogram_bounds (of
declared type anyarray) as a valid input for anycompatiblearray.
I wonder if that isn't a bug we need to fix in the anycompatible
patch, independently of anything else. We'd not be able to deduce
an actual element type from such an input, but we already cannot
do so when we match it to an anyarray parameter.

Anyway, attached find

0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.

0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.

I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs. I'm slightly discouraged about
whether 0002 is worth proceeding with. Any thoughts?

regards, tom lane

#16

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#15)

Re: Compatible defaults for LEAD/LAG

út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I see few possibilities how to finish and close this issue:
1. use anycompatible type and add note to documentation so expression of
optional argument can change a result type, and so this is Postgres
specific - other databases and ANSI SQL disallow this.
It is the most simple solution, and it is good enough for this case, that
is not extra important.
2. choose the correct type somewhere inside the parser - for these two
functions.
3. introduce and implement some generic solution for this case - it can

be

implemented on C level via some function helper or on syntax level
CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default

defval

a%type) ...
"defval argname%type" means for caller's expression "CAST(defval AS
typeof(argname))"

I continue to feel that the spec's definition of this is not so
obviously right that we should jump through hoops to duplicate it.
In fact, I don't even agree that we need a disclaimer in the docs
saying that it's not quite the same. Only the most nitpicky
language lawyers would ever even notice.

In hopes of moving this along a bit, I experimented with converting
the other functions I listed to use anycompatible. I soon found that
touching any functions/operators that are listed in operator classes
would open a can of worms far larger than I'd previously supposed.
To maintain consistency, we'd have to propagate the datatype changes
to *every* function/operator associated with those opfamilies --- many
of which only have one any-foo input and thus aren't on my original
list. (An example here is that extending btree array_ops will require
changing the max(anyarray) and min(anyarray) aggregates too.) We'd
then end up with a situation that would be rather confusing from a
user's standpoint, in that it'd be quite un-obvious why some array
functions use anyarray while other ones use anycompatiblearray.

So I backed off to just changing the functions/operators that have
no opclass connections, such as array_cat. Even that has some
downsides --- for example, in the attached patch, it's necessary
to change some polymorphism.sql examples that explicitly reference
array_cat(anyarray). I wonder whether this change would break a
significant number of user-defined aggregates or operators.

(Note that I think we'd have to resist the temptation to fix that
by letting CREATE AGGREGATE et al accept support functions that
take anyarray/anycompatiblearray (etc) interchangeably. A lot of
the security analysis that went into CVE-2020-14350 depended on
the assumption that utility commands only do exact lookups of
support functions. If we tried to be lax about this, we'd
re-introduce the possibility for hostile capture of function
references in extension scripts.)

Another interesting issue, not seen in the attached but which
came up while I was experimenting with the more aggressive patch,
was this failure in the polymorphism test:
select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
-ERROR:  cannot compare arrays of different element types
+ERROR:  function max(anyarray) does not exist
That's because we don't accept pg_stats.histogram_bounds (of
declared type anyarray) as a valid input for anycompatiblearray.
I wonder if that isn't a bug we need to fix in the anycompatible
patch, independently of anything else. We'd not be able to deduce
an actual element type from such an input, but we already cannot
do so when we match it to an anyarray parameter.

Anyway, attached find

0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.

0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.

I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs. I'm slightly discouraged about
whether 0002 is worth proceeding with. Any thoughts?

I think so 0002 has sense - more than doc I miss related regress tests, but
it is partially covered by anycompatible tests

Anyway I tested both patches and there is not problem with compilation,
building doc, and make check-world passed

I'll mark this patch as ready for committer

Best regards

Pavel

Show quoted text

regards, tom lane

#17

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Pavel Stehule (#16)

Re: Compatible defaults for LEAD/LAG

Pavel Stehule <pavel.stehule@gmail.com> writes:

út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Anyway, attached find
0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.
0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.
I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs. I'm slightly discouraged about
whether 0002 is worth proceeding with. Any thoughts?

I think so 0002 has sense - more than doc I miss related regress tests, but
it is partially covered by anycompatible tests

I didn't see any need for particularly exhaustive testing, but
I did add one new test for an operator and one for a function.
Pushed with that and the necessary docs work.

regards, tom lane

#18

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#17)

Re: Compatible defaults for LEAD/LAG

st 4. 11. 2020 v 22:12 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Anyway, attached find
0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.
0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.
I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs. I'm slightly discouraged about
whether 0002 is worth proceeding with. Any thoughts?

I think so 0002 has sense - more than doc I miss related regress tests,

but

it is partially covered by anycompatible tests

I didn't see any need for particularly exhaustive testing, but
I did add one new test for an operator and one for a function.
Pushed with that and the necessary docs work.

ok, Thank you

Pavel

Show quoted text

regards, tom lane

Compatible defaults for LEAD/LAG

Attachments:

Attachments: