Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Started by Oliver Fordover 7 years ago122 messages
Jump to latest
#1Oliver Ford
ojford@gmail.com

Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
FIRST/LAST to the non-aggregate window functions.

A previous patch
(/messages/by-id/CA+=vxNa5_N1q5q5OkxC0aQnNdbo2Ru6GVw+86wk+oNsUNJDLig@mail.gmail.com)
partially implemented this feature. However, that patch worked by
adding the null treatment clause to the window frame's frameOptions
variable, and consequently had the limitation that it wasn't possible
to reuse a window frame definition in a single query where two
functions were called that had different null treatment options. This
meant that the patch was never committed. The attached path takes a
different approach which gets around this limitation.

For example, the following query would not work correctly with the
implementation in the old patch but does with the attached patch:

WITH cte (x) AS (
select null union select 1 union select 2)
SELECT x,
first_value(x) over w as with_default,
first_value(x) respect nulls over w as with_respect,
first_value(x) ignore nulls over w as with_ignore
from cte WINDOW w as (order by x nulls first rows between unbounded
preceding and unbounded following);
x | with_default | with_respect | with_ignore
---+--------------+--------------+-------------
| | | 1
1 | | | 1
2 | | | 1
(3 rows)

== Implementation ==

The patch adds two types to the pg_type catalog: "ignorenulls" and
"fromlast". These types are of the Boolean category, and work as
wrappers around the bool type. They are used as function arguments to
extra versions of the window functions that take additional boolean
arguments. RESPECT NULLS and FROM FIRST are ignored by the parser, but
IGNORE NULLS and FROM LAST lead to the extra versions being called
with arguments to ignore nulls and order from last.

== Testing ==

Updated documentation and added regression tests. All existing tests
pass. This change will need a catversion bump.
Thanks to Krasiyan Andreev for initially testing this patch.

Attachments:

0001-respect.patchapplication/octet-stream; name=0001-respect.patchDownload+1037-55
#2David Fetter
david@fetter.org
In reply to: Oliver Ford (#1)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

On Fri, Jul 13, 2018 at 01:52:00PM +0100, Oliver Ford wrote:

Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
FIRST/LAST to the non-aggregate window functions.

Please find attached an updated version for OID drift.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

0001-RESPECT-IGNORE-NULLS-FIRST-LAST-in-windowing-functio.patchtext/x-diff; charset=us-asciiDownload+1085-56
#3Krasiyan Andreev
krasiyan@gmail.com
In reply to: David Fetter (#2)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Hi,
Patch applies and compiles, all included tests and building of the docs
pass.
I am using last version from more than two months ago in production
environment with real data and I didn't find any bugs,
so I'm marking this patch as ready for committer in the commitfest app.

На сб, 28.07.2018 г. в 22:00 ч. David Fetter <david@fetter.org> написа:

Show quoted text

On Fri, Jul 13, 2018 at 01:52:00PM +0100, Oliver Ford wrote:

Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
FIRST/LAST to the non-aggregate window functions.

Please find attached an updated version for OID drift.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Krasiyan Andreev (#3)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

"Krasiyan" == Krasiyan Andreev <krasiyan@gmail.com> writes:

Krasiyan> Hi,

Krasiyan> Patch applies and compiles, all included tests and building
Krasiyan> of the docs pass. I am using last version from more than two
Krasiyan> months ago in production environment with real data and I
Krasiyan> didn't find any bugs, so I'm marking this patch as ready for
Krasiyan> committer in the commitfest app.

Unfortunately, reviewing it from a committer perspective - I can't
possibly commit this as it stands, and anything I did to it would be
basically a rewrite of much of it.

Some of the problems could be fixed. For example the type names could be
given pg_* prefixes (it's clearly not acceptable to create random
special-purpose boolean subtypes in pg_catalog and _not_ give them such
a prefix), and the precedence hackery in gram.y could have comments
added (gram.y is already bad enough; _anything_ fancy with precedence
has to be described in the comments). But I don't like that hack with
the special types at all, and I think that needs a better solution.

Normally I'd push hard to try and get some solution that's sufficiently
generic to allow user-defined functions to make use of the feature. But
I think the SQL spec people have managed to make that literally
impossible in this case, what with the FROM keyword appearing in the
middle of a production and not followed by anything sufficiently
distinctive to even use for extra token lookahead.

Also, as has been pointed out in a number of previous features, we're
starting to accumulate identifiers that are reserved in subtly different
ways from our basic four-category system (which is itself a significant
elaboration compared to the spec's simple reserved/unreserved
distinction). As I recall this objection was specifically raised for
CUBE, but justified there by the existence of the contrib/cube extension
(and the fact that the standard CUBE() construct is used only in very
specific places in the syntax). This patch would make lead / lag /
first_value / last_value / nth_value syntactically "special" while not
actually reserving them (beyond having them in unreserved_keywords); I
think serious consideration should be given to whether they should
instead become col_name_keywords (which would, I believe, make it
unnecessary to mess with precedence).

Anyone have any thoughts or comments on the above?

--
Andrew (irc:RhodiumToad)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#4)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Normally I'd push hard to try and get some solution that's sufficiently
generic to allow user-defined functions to make use of the feature. But
I think the SQL spec people have managed to make that literally
impossible in this case, what with the FROM keyword appearing in the
middle of a production and not followed by anything sufficiently
distinctive to even use for extra token lookahead.

Yeah. Is there any appetite for a "Just Say No" approach? That is,
refuse to implement the spec's syntax on the grounds that it's too
brain-dead to even consider, and instead provide some less random,
more extensibility-friendly way to accomplish the same thing?

The FROM FIRST/LAST bit seems particularly badly thought through,
because AFAICS it is flat out ambiguous with a normal FROM clause
immediately following the window function call. The only way to
make it not so would be to make FIRST and LAST be fully reserved,
which is neither a good idea nor spec-compliant.

In short, there's a really good case to be made here that the SQL
committee is completely clueless about syntax design, and so we
shouldn't follow this particular pied piper.

... This patch would make lead / lag /
first_value / last_value / nth_value syntactically "special" while not
actually reserving them (beyond having them in unreserved_keywords); I
think serious consideration should be given to whether they should
instead become col_name_keywords (which would, I believe, make it
unnecessary to mess with precedence).

I agree that messing with the precedence rules is likely to have
unforeseen and undesirable side-effects. Generally, if you need
to create a precedence rule, that's because your grammar is
ambiguous. Precedence fixes that in a well-behaved way only for
cases that actually are very much like operator precedence rules.
Otherwise, you may just be papering over something that isn't
working very well. See e.g. commits 670a6c7a2 and 12b716457
for past cases where we learned that the hard way. (The latter
also points out that if you must have a precedence hack, it's
safer to hack individual rules than to stick precedences onto
terminal symbols.) In the case at hand, since the proposed patch
doesn't make FIRST and LAST be fully reserved, it seems just about
certain that it can be made to misbehave, including failing on
queries that were and should remain legal.

regards, tom lane

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#5)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> The FROM FIRST/LAST bit seems particularly badly thought through,
Tom> because AFAICS it is flat out ambiguous with a normal FROM clause
Tom> immediately following the window function call. The only way to
Tom> make it not so would be to make FIRST and LAST be fully reserved,
Tom> which is neither a good idea nor spec-compliant.

In the actual spec syntax it's not ambiguous at all because NTH_VALUE is
a reserved word (as are LEAD, LAG, FIRST_VALUE and LAST_VALUE), and OVER
is a mandatory clause in its syntax, so a FROM appearing before the OVER
must be part of a FROM FIRST/LAST and not introducing a FROM-clause.

In our syntax, if we made NTH_VALUE etc. a col_name_keyword (and thus
not legal as a function name outside its own special syntax) it would
also become unambiguous.

i.e. given this token sequence (with . marking the current posision):

select nth_value(x) . from first ignore

if we know up front that "nth_value" is a window function and not any
other kind of function, we know that we have to shift the "from" rather
than reducing the select-list because we haven't seen an "over" yet.
(Neither "first" nor "ignore" are reserved, so "select foo(x) from first
ignore;" is a valid and complete query, and without reserving the
function name we'd need at least four tokens of lookahead to decide
otherwise.)

This is why I think the col_name_keyword option needs to be given
serious consideration - it still doesn't reserve the names as strongly
as the spec does, but enough to make the standard syntax work without
needing any dubious hacks.

--
Andrew (irc:RhodiumToad)

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#6)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> The FROM FIRST/LAST bit seems particularly badly thought through,
Tom> because AFAICS it is flat out ambiguous with a normal FROM clause
Tom> immediately following the window function call. The only way to
Tom> make it not so would be to make FIRST and LAST be fully reserved,
Tom> which is neither a good idea nor spec-compliant.

In the actual spec syntax it's not ambiguous at all because NTH_VALUE is
a reserved word (as are LEAD, LAG, FIRST_VALUE and LAST_VALUE), and OVER
is a mandatory clause in its syntax, so a FROM appearing before the OVER
must be part of a FROM FIRST/LAST and not introducing a FROM-clause.

Hmm ...

In our syntax, if we made NTH_VALUE etc. a col_name_keyword (and thus
not legal as a function name outside its own special syntax) it would
also become unambiguous.
i.e. given this token sequence (with . marking the current posision):
select nth_value(x) . from first ignore
if we know up front that "nth_value" is a window function and not any
other kind of function, we know that we have to shift the "from" rather
than reducing the select-list because we haven't seen an "over" yet.

I don't really find that to be a desirable solution, because quite aside
from the extensibility problem, it would mean that a lot of errors
become "syntax error" where we formerly gave a more useful message.

This does open up a thought about how to proceed, though. I'd been
trying to think of a way to solve this using base_yylex's ability to
do some internal lookahead and change token types based on that.
If you just think of recognizing FROM FIRST/LAST, you get nowhere
because that's still legal in other contexts. But if you were to
look for FROM followed by FIRST/LAST followed by IGNORE/RESPECT/OVER,
I think that could only validly happen in this syntax. It'd take
some work to extend base_yylex to look ahead 2 tokens not one, but
I'm sure that could be done. (You'd also need a lookahead rule to
match "IGNORE/RESPECT NULLS OVER", but that seems just as doable.)
Then the relevant productions use FROM_LA, IGNORE_LA, RESPECT_LA
instead of the corresponding bare tokens, and the grammar no longer
has an ambiguity problem.

regards, tom lane

#8Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#7)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> If you just think of recognizing FROM FIRST/LAST, you get nowhere
Tom> because that's still legal in other contexts. But if you were to
Tom> look for FROM followed by FIRST/LAST followed by
Tom> IGNORE/RESPECT/OVER, I think that could only validly happen in
Tom> this syntax.

No; you need to go four tokens ahead in total, not three. Assuming
nth_value is unreserved, then

select nth_value(x) from first ignore;

is a valid query that has nth_value(x) as an expression, "first" as a
table name and "ignore" as its alias. Only when you see NULLS after
IGNORE, or OVER after FIRST/LAST, do you know that you're looking at
a window function and not a from clause.

So FROM_LA would have to mean "FROM" followed by any of:

FIRST IGNORE NULLS
LAST IGNORE NULLS
FIRST RESPECT NULLS
LAST RESPECT NULLS
FIRST OVER
LAST OVER

Remember that while OVER is reserved, all of FIRST, LAST, RESPECT and
IGNORE are unreserved.

Tom> It'd take some work to extend base_yylex to look ahead 2 tokens
Tom> not one, but I'm sure that could be done. (You'd also need a
Tom> lookahead rule to match "IGNORE/RESPECT NULLS OVER", but that
Tom> seems just as doable.) Then the relevant productions use FROM_LA,
Tom> IGNORE_LA, RESPECT_LA instead of the corresponding bare tokens,
Tom> and the grammar no longer has an ambiguity problem.

Yeah, but at the cost of having to extend base_yylex to go 3 tokens
ahead (not 2) rather than the current single lookahead slot.

Doable, certainly (probably not much harder to do 3 than 2 actually)

--
Andrew (irc:RhodiumToad)

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#8)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> If you just think of recognizing FROM FIRST/LAST, you get nowhere
Tom> because that's still legal in other contexts. But if you were to
Tom> look for FROM followed by FIRST/LAST followed by
Tom> IGNORE/RESPECT/OVER, I think that could only validly happen in
Tom> this syntax.

No; you need to go four tokens ahead in total, not three. Assuming
nth_value is unreserved, then
select nth_value(x) from first ignore;
is a valid query that has nth_value(x) as an expression, "first" as a
table name and "ignore" as its alias.

No, because once IGNORE is a keyword, even unreserved, it's not legal
as an AS-less alias. We'd be breaking queries like that no matter what.

(I know there are people around here who'd like to remove that
restriction, but it's not happening anytime soon IMO.)

regards, tom lane

#10Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#9)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

select nth_value(x) from first ignore;

Tom> No, because once IGNORE is a keyword, even unreserved, it's not
Tom> legal as an AS-less alias.

That rule only applies in the select-list, not in the FROM clause; table
aliases in FROM are just ColId, so they can be anything except a fully
reserved or type_func_name keyword.

--
Andrew (irc:RhodiumToad)

#11Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#7)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

So I've tried to rough out a decision tree for the various options on
how this might be implemented (discarding the "use precedence hacks"
option). Opinions? Additions?

(formatted for emacs outline-mode)

* 1. use lexical lookahead

  +: relatively straightforward parser changes
  +: no new reserved words
  +: has the option of working extensibly with all functions

-: base_yylex needs extending to 3 lookahead tokens

** 1.1. Allow from/ignore clause on all (or all non-agg) window function calls

If the clauses are legal on all window functions, what to do about existing
window functions for which the clauses do not make sense?

*** 1.1.1. Ignore the clause when the function isn't aware of it

  +: simple
  -: somewhat surprising for users perhaps?

*** 1.1.2. Change the behavior of the windowapi in some consistent way

  Not sure if this can work.
  +: fairly simple(maybe?) and predictable
  -: changes the behavior of existing window functions

** 1.2. Allow from/ignore clause on only certain functions

  +: avoids any unexpected behavior
  -: needs some way to control what functions allow it

*** 1.2.1. Check the function name in parse analysis against a fixed list.

  +: simple
  -: not extensible

*** 1.2.2. Provide some option in CREATE FUNCTION

  +: extensible
  -: fairly intrusive, adding stuff to create function and pg_proc

*** 1.2.3. Do something magical with function argument types

  +: doesn't need changes in create function / pg_proc
  -: it's an ugly hack

* 2. reserve nth_value etc. as functions

  +: follows the spec reasonably well
  +: less of a hack than extending base_yylex

-: new reserved words
-: more parser rules
-: not extensible

(now goto 1.2.1)

* 3. "just say no" to the spec

e.g. add new functions like lead_ignore_nulls(), or add extra boolean
args to lead() etc. telling them to skip nulls

  +: simple
  -: doesn't conform to spec
  -: using extra args isn't quite the right semantics

--
Andrew (irc:RhodiumToad)

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#11)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So I've tried to rough out a decision tree for the various options on
how this might be implemented (discarding the "use precedence hacks"
option). Opinions? Additions?

I think it'd be worth at least drafting an implementation for the
lexical-lookahead fix. I think it's likely that we'll need to extend
base_yylex to do more lookahead in the future even if we don't do it
for this, given the SQL committee's evident love for COBOL-ish syntax
and lack of regard for what you can do in LALR(1).

The questions of how we interface to the individual window functions
are really independent of how we handle the parsing problem. My
first inclination is to just pass the flags down to the window functions
(store them in WindowObject and provide some additional inquiry functions
in windowapi.h) and let them deal with it.

If the clauses are legal on all window functions, what to do about existing
window functions for which the clauses do not make sense?

Option 1: do nothing, document that nothing happens if w.f. doesn't
implement it.

Option 2: record whether the inquiry functions got called. At end of
query, error out if they weren't and the options were used.

It's also worth wondering if we couldn't just implement the flags in
some generic fashion and not need to involve the window functions at
all. FROM LAST, for example, could and perhaps should be implemented
by inverting the sort order. Possibly IGNORE NULLS could be implemented
inside the WinGetFuncArgXXX functions? These behaviors might or might
not make much sense with other window functions, but that doesn't seem
like it's our problem.

regards, tom lane

#13Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#12)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

So I've tried to rough out a decision tree for the various options
on how this might be implemented (discarding the "use precedence
hacks" option). Opinions? Additions?

Tom> I think it'd be worth at least drafting an implementation for the
Tom> lexical-lookahead fix. I think it's likely that we'll need to
Tom> extend base_yylex to do more lookahead in the future even if we
Tom> don't do it for this, given the SQL committee's evident love for
Tom> COBOL-ish syntax and lack of regard for what you can do in
Tom> LALR(1).

That's not _quite_ a fair criticism of the SQL committee; they're not
ignoring the capabilities of parsers, they're just not making their
syntax robust in the presence of the kinds of extensions and
generalizations that we want to do.

Without exception that I know of, every time we've run into a problem
that needed ugly precedence rules or extra lookahead it's been caused by
us not reserving something that is reserved in the spec, or by us
allowing constructs that are not in the spec at all (postfix operators,
especially), or by us deliberately generalizing what the spec allows
(e.g. allowing full expressions where the spec only allows column
references, or allowing extra parens around subselects) or where we've
repurposed syntax from the spec in an incompatible way (as in
ANY(array)).

Anyway, for the time being I will mark this patch as "returned with
feedback".

If the clauses are legal on all window functions, what to do about
existing window functions for which the clauses do not make sense?

Tom> Option 1: do nothing, document that nothing happens if w.f.
Tom> doesn't implement it.

That was 1.1.1 on my list.

Tom> Option 2: record whether the inquiry functions got called. At end
Tom> of query, error out if they weren't and the options were used.

Erroring at the _end_ of the query seems a bit of a potential surprise.

Tom> It's also worth wondering if we couldn't just implement the flags
Tom> in some generic fashion and not need to involve the window
Tom> functions at all.

That was what I meant by option 1.1.2 on my list.

Tom> FROM LAST, for example, could and perhaps should be implemented by
Tom> inverting the sort order.

Actually that can't work for reasons brought up in the recent discussion
of optimization of window function sorts: if you change the sort order
you potentially disturb the ordering of peer rows, and the spec requires
that an (nth_value(x,n) from last over w) and (otherfunc(x) over w) for
order-equivalent windows "w" must see the peer rows in the same order.

So FROM LAST really does have to keep the original sort order, and count
backwards from the end of the window.

Tom> Possibly IGNORE NULLS could be implemented inside the
Tom> WinGetFuncArgXXX functions? These behaviors might or might not
Tom> make much sense with other window functions, but that doesn't seem
Tom> like it's our problem.

That's about what I was thinking for option 1.1.2, yes.

--
Andrew (irc:RhodiumToad)

#14Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Krasiyan Andreev (#3)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

"Krasiyan" == Krasiyan Andreev <krasiyan@gmail.com> writes:

Krasiyan> I am using last version from more than two months ago in
Krasiyan> production environment with real data and I didn't find any
Krasiyan> bugs, so I'm marking this patch as ready for committer in the
Krasiyan> commitfest app.

Oliver (or anyone else), do you plan to continue working on this in the
immediate future, in line with the comments from myself and Tom in this
thread? If so I'll bump it to the next CF, otherwise I'll mark it
"returned with feedback".

I'm happy to help out with further work on this patch if needed, time
permitting.

--
Andrew (irc:RhodiumToad)

#15Stephen Frost
sfrost@snowman.net
In reply to: Andrew Gierth (#11)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Greetings,

This seems to have died out, and that's pretty unfortunate because this
is awfully useful SQL standard syntax that people look for and wish we
had.

* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:

So I've tried to rough out a decision tree for the various options on
how this might be implemented (discarding the "use precedence hacks"
option). Opinions? Additions?

(formatted for emacs outline-mode)

* 1. use lexical lookahead

+: relatively straightforward parser changes
+: no new reserved words
+: has the option of working extensibly with all functions

-: base_yylex needs extending to 3 lookahead tokens

This sounds awful grotty and challenging to do and get right, and the
alternative (just reserving these, as the spec does) doesn't seem so
draconian as to be that much of an issue.

* 2. reserve nth_value etc. as functions

+: follows the spec reasonably well
+: less of a hack than extending base_yylex

-: new reserved words
-: more parser rules
-: not extensible

For my 2c, at least, reserving these strikes me as entirely reasonable.
Yes, it sucks that we have to partially-reserve some additional
keywords, but such is life. I get that we'll throw syntax errors
sometimes when we might have given a better error, but I think we can
accept that.

(now goto 1.2.1)

Hmm, not sure this was right? but sure, I'll try...

*** 1.2.1. Check the function name in parse analysis against a fixed list.

+: simple
-: not extensible

Seems like this is more-or-less required since we'd be reserving them..?

*** 1.2.2. Provide some option in CREATE FUNCTION

+: extensible
-: fairly intrusive, adding stuff to create function and pg_proc

How would this work though, if we reserve the functions as keywords..?
Maybe I'm not entirely following, but wouldn't attempts to use other
functions end up with syntax errors in at least some of the cases,
meaning that having other functions support this wouldn't really work?
I don't particularly like the idea that some built-in functions would
always work but others would work but only some of the time.

*** 1.2.3. Do something magical with function argument types

+: doesn't need changes in create function / pg_proc
-: it's an ugly hack

Not really a fan of 'ugly hack'.

* 3. "just say no" to the spec

e.g. add new functions like lead_ignore_nulls(), or add extra boolean
args to lead() etc. telling them to skip nulls

+: simple
-: doesn't conform to spec
-: using extra args isn't quite the right semantics

Ugh, no thank you.

Thanks!

Stephen

#16Krasiyan Andreev
krasiyan@gmail.com
In reply to: Stephen Frost (#15)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Thank you very much for feedback and yes, that is very useful SQL syntax.
Maybe you miss my previous answer, but you are right, that patch is
currently dead,
because some important design questions must be discussed here, before
patch rewriting.

I have dropped support of from first/last for nth_value(),
but also I reimplemented it in a different way,
by using negative number for the position argument, to be able to get the
same frame in exact reverse order.
After that patch becomes much more simple and some concerns about
precedence hack has gone.

I have not renamed special bool type "ignorenulls"
(I know that it is not acceptable way for calling extra version
of window functions, but also it makes things very easy and it can reuse
frames),
but I removed the other special bool type "fromlast".

Attached file was for PostgreSQL 13 (master git branch, last commit fest),
everything was working and patch was at the time in very good shape, all
tests was passed.

I read previous review and suggestions from Tom about special bool type and
unreserved keywords and also,
that IGNORE NULLS could be implemented inside the WinGetFuncArgXXX
functions,
but I am not sure how exactly to proceed (some example will be very
helpful).

На чт, 30.04.2020 г. в 21:58 Stephen Frost <sfrost@snowman.net> написа:

Show quoted text

Greetings,

This seems to have died out, and that's pretty unfortunate because this
is awfully useful SQL standard syntax that people look for and wish we
had.

* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:

So I've tried to rough out a decision tree for the various options on
how this might be implemented (discarding the "use precedence hacks"
option). Opinions? Additions?

(formatted for emacs outline-mode)

* 1. use lexical lookahead

+: relatively straightforward parser changes
+: no new reserved words
+: has the option of working extensibly with all functions

-: base_yylex needs extending to 3 lookahead tokens

This sounds awful grotty and challenging to do and get right, and the
alternative (just reserving these, as the spec does) doesn't seem so
draconian as to be that much of an issue.

* 2. reserve nth_value etc. as functions

+: follows the spec reasonably well
+: less of a hack than extending base_yylex

-: new reserved words
-: more parser rules
-: not extensible

For my 2c, at least, reserving these strikes me as entirely reasonable.
Yes, it sucks that we have to partially-reserve some additional
keywords, but such is life. I get that we'll throw syntax errors
sometimes when we might have given a better error, but I think we can
accept that.

(now goto 1.2.1)

Hmm, not sure this was right? but sure, I'll try...

*** 1.2.1. Check the function name in parse analysis against a fixed

list.

+: simple
-: not extensible

Seems like this is more-or-less required since we'd be reserving them..?

*** 1.2.2. Provide some option in CREATE FUNCTION

+: extensible
-: fairly intrusive, adding stuff to create function and pg_proc

How would this work though, if we reserve the functions as keywords..?
Maybe I'm not entirely following, but wouldn't attempts to use other
functions end up with syntax errors in at least some of the cases,
meaning that having other functions support this wouldn't really work?
I don't particularly like the idea that some built-in functions would
always work but others would work but only some of the time.

*** 1.2.3. Do something magical with function argument types

+: doesn't need changes in create function / pg_proc
-: it's an ugly hack

Not really a fan of 'ugly hack'.

* 3. "just say no" to the spec

e.g. add new functions like lead_ignore_nulls(), or add extra boolean
args to lead() etc. telling them to skip nulls

+: simple
-: doesn't conform to spec
-: using extra args isn't quite the right semantics

Ugh, no thank you.

Thanks!

Stephen

Attachments:

pg13_ignore_nulls.patchtext/x-patch; charset=US-ASCII; name=pg13_ignore_nulls.patchDownload+975-52
#17Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#12)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

I revisited the thread:
/messages/by-id/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users. For FIRST/LAST I am not so excited since there are alternatives
as our document stats, so FIRST/LAST are not included in the patch.

Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS. I think it's not hard to implement it for others (lead, lag,
first_value and last_value). No document nor test patches are
included for now.

Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

The questions of how we interface to the individual window functions
are really independent of how we handle the parsing problem. My
first inclination is to just pass the flags down to the window functions
(store them in WindowObject and provide some additional inquiry functions
in windowapi.h) and let them deal with it.

I agree with this. Also I do not change the prototype of
nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
parser to parse/analysis and finally to WindowObject.

It's also worth wondering if we couldn't just implement the flags in
some generic fashion and not need to involve the window functions at
all. FROM LAST, for example, could and perhaps should be implemented
by inverting the sort order. Possibly IGNORE NULLS could be implemented
inside the WinGetFuncArgXXX functions? These behaviors might or might
not make much sense with other window functions, but that doesn't seem
like it's our problem.

Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
this direction (not implemented in the patch at this point).

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Attachments:

v1-0001-Implement-IGNORE-or-RESPECT-NULLS-parse-analysis-.patchtext/x-patch; charset=us-asciiDownload+43-5
v1-0002-Implement-IGNORE-or-RESPECT-NULLS-planner-part.patchtext/x-patch; charset=us-asciiDownload+1-1
v1-0003-Implement-IGNORE-or-RESPECT-NULLS-executor-and-wi.patchtext/x-patch; charset=us-asciiDownload+40-4
#18Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#17)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

On Sat, 22 Apr 2023, 13:14 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:

I revisited the thread:

/messages/by-id/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users. For FIRST/LAST I am not so excited since there are alternatives
as our document stats, so FIRST/LAST are not included in the patch.

Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS. I think it's not hard to implement it for others (lead, lag,
first_value and last_value). No document nor test patches are
included for now.

I've actually recently been looking at this feature again recently as well.
One thing I wondered, but would need consensus, is to take the
SEEK_HEAD/SEEK_TAIL case statements out of WinGetFuncArgInPartition. This
function is only called by leadlag_common, which uses SEEK_CURRENT, so
those case statements are never reached. Taking them out simplifies the
code as it is but means future features might need it re-added (although
I'm not sure the use case for it, as that function is for window funcs that
ignore the frame options).

Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

The questions of how we interface to the individual window functions
are really independent of how we handle the parsing problem. My
first inclination is to just pass the flags down to the window functions
(store them in WindowObject and provide some additional inquiry functions
in windowapi.h) and let them deal with it.

I agree with this. Also I do not change the prototype of

nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
parser to parse/analysis and finally to WindowObject.

This is a much better option than my older patch which needed to change the
functions.

It's also worth wondering if we couldn't just implement the flags in
some generic fashion and not need to involve the window functions at
all. FROM LAST, for example, could and perhaps should be implemented
by inverting the sort order. Possibly IGNORE NULLS could be implemented
inside the WinGetFuncArgXXX functions? These behaviors might or might
not make much sense with other window functions, but that doesn't seem
like it's our problem.

Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
this direction (not implemented in the patch at this point).

+1 for doing it here. Maybe also refactor WinGetFuncArgInFrame, putting the
exclusion checks in a static function as that function is already pretty
big?

Show quoted text

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#19Vik Fearing
vik@postgresfriends.org
In reply to: Tatsuo Ishii (#17)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

On 4/22/23 14:14, Tatsuo Ishii wrote:

I revisited the thread:
/messages/by-id/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users.

Excellent. I was thinking about picking my version of this patch up
again, but I think this might be better than mine.

I am curious why set_mark is false in the IGNORE version instead of also
being const_offset. Surely the nth non-null in the frame will never go
backwards.

Dealing with marks was the main reason (I think) that my patch was not
accepted.

For FIRST/LAST I am not so excited since there are alternatives
as our document stats,

I disagree with this. The point of having FROM LAST is to avoid
calculating a new window and running a new pass over it.

so FIRST/LAST are not included in the patch.

I do agree that we can have <null treatment> without <from first or
last> so let's move forward with this and handle the latter later.

Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS.

This should not be hard coded. It should be a new field in pg_proc
(with a sanity check that it is only true for window functions). That
way custom window functions can implement it.

I think it's not hard to implement it for others (lead, lag,
first_value and last_value).

It doesn't seem like it should be, no.

No document nor test patches are included for now.

I can volunteer to work on these if you want.

Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

For me, this is perfectly okay. Keep them at the lowest level of
reservation as possible.
--
Vik Fearing

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#19)
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Vik Fearing <vik@postgresfriends.org> writes:

On 4/22/23 14:14, Tatsuo Ishii wrote:

Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

For me, this is perfectly okay. Keep them at the lowest level of
reservation as possible.

Yeah, keep them unreserved if at all possible. Any higher reservation
level risks breaking existing applications that might be using these
words as column or function names.

regards, tom lane

#21Tatsuo Ishii
ishii@postgresql.org
In reply to: Vik Fearing (#19)
#22Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#20)
#23Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#22)
#24Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#23)
#25Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#24)
#26Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#25)
#27Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#26)
#28Oliver Ford
ojford@gmail.com
In reply to: Oliver Ford (#26)
#29Vik Fearing
vik@postgresfriends.org
In reply to: Oliver Ford (#28)
#30Oliver Ford
ojford@gmail.com
In reply to: Vik Fearing (#29)
#31Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#30)
#32David G. Johnston
david.g.johnston@gmail.com
In reply to: Tatsuo Ishii (#31)
#33Tatsuo Ishii
ishii@postgresql.org
In reply to: David G. Johnston (#32)
#34Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#31)
#35Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#35)
#37Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#37)
#39Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#38)
#40Oliver Ford
ojford@gmail.com
In reply to: Tom Lane (#38)
#41Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#41)
#43Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#41)
#44Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#43)
#45Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#43)
#46Krasiyan Andreev
krasiyan@gmail.com
In reply to: Tatsuo Ishii (#45)
#47Tatsuo Ishii
ishii@postgresql.org
In reply to: Krasiyan Andreev (#46)
#48Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#47)
#49Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#48)
#50Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#49)
#51Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#50)
#52Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#51)
#53Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#52)
#54Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#52)
#55Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#54)
#56Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#55)
#57Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#56)
#58Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#57)
#59Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#58)
#60Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#59)
#61Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#60)
#62Oliver Ford
ojford@gmail.com
In reply to: Oliver Ford (#61)
#63Krasiyan Andreev
krasiyan@gmail.com
In reply to: Oliver Ford (#62)
#64Tatsuo Ishii
ishii@postgresql.org
In reply to: Krasiyan Andreev (#63)
#65Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#64)
#66Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#65)
#67Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#66)
#68Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#67)
#69Krasiyan Andreev
krasiyan@gmail.com
In reply to: Tatsuo Ishii (#68)
#70Tatsuo Ishii
ishii@postgresql.org
In reply to: Krasiyan Andreev (#69)
#71Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#68)
#72Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#71)
#73Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#72)
#74Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#73)
#75Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#74)
#76Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#75)
#77Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#75)
#78Chao Li
li.evan.chao@gmail.com
In reply to: Tatsuo Ishii (#77)
#79Tatsuo Ishii
ishii@postgresql.org
In reply to: Chao Li (#78)
#80Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#79)
#81Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#80)
#82Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#81)
#83Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#82)
#84Oliver Ford
ojford@gmail.com
In reply to: Tatsuo Ishii (#83)
#85Tatsuo Ishii
ishii@postgresql.org
In reply to: Oliver Ford (#84)
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#85)
#87Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tatsuo Ishii (#85)
#88Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#86)
#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#87)
#90Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#86)
#91Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#89)
#92Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#88)
#93Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#90)
#94Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#93)
#95Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Tatsuo Ishii (#94)
#96Tatsuo Ishii
ishii@postgresql.org
In reply to: Alvaro Herrera (#87)
#97Tatsuo Ishii
ishii@postgresql.org
In reply to: Paul Ramsey (#95)
#98Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#97)
#99Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#98)
#100Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#99)
#101Chao Li
li.evan.chao@gmail.com
In reply to: Tatsuo Ishii (#100)
#102Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#99)
#103Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#102)
#104Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#98)
#105Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#99)
#106Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#99)
#107Tatsuo Ishii
ishii@postgresql.org
In reply to: Chao Li (#101)
#108Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tatsuo Ishii (#107)
#109Tatsuo Ishii
ishii@postgresql.org
In reply to: Alvaro Herrera (#108)
#110Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#107)
#111Michael Paquier
michael@paquier.xyz
In reply to: Tatsuo Ishii (#110)
#112Tatsuo Ishii
ishii@postgresql.org
In reply to: Michael Paquier (#111)
#113Chao Li
li.evan.chao@gmail.com
In reply to: Tatsuo Ishii (#112)
#114Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#112)
#115Tatsuo Ishii
ishii@postgresql.org
In reply to: Chao Li (#113)
#116Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#105)
#117Chao Li
li.evan.chao@gmail.com
In reply to: Tatsuo Ishii (#116)
#118Tatsuo Ishii
ishii@postgresql.org
In reply to: Chao Li (#117)
#119Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#116)
#120David Rowley
dgrowleyml@gmail.com
In reply to: Tatsuo Ishii (#118)
#121Tatsuo Ishii
ishii@postgresql.org
In reply to: David Rowley (#120)
#122Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#121)