Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
lag, [...]. This is not implemented in PostgreSQL
(http://www.postgresql.org/docs/devel/static/functions-window.html)
I've had a go at implementing this, and I've attached the resulting patch.
It's not finished yet, but I was hoping to find out if my solution is along
the right lines.
In particular, I'm storing the ignore-nulls flag in the frameOptions of a
window function definition, and am adding a function to the windowapi.h to
get at these options. I'm keeping the last non-null value in
WinGetPartitionLocalMemory (which I hope is the right place), but I'm not
using any of the *GetDatum macros to access it.
An example of my change's behaviour:
nwhite=# select *, lag(num,0) ignore nulls over (order by generate_series)
from
nwhite-# (select generate_series from generate_series(0,10)) s
nwhite-# left outer join
nwhite-# numbers n
nwhite-# on (s.generate_series = n.num);
generate_series | num | lag
-----------------+-----+-----
0 | |
1 | 1 | 1
2 | | 1
3 | | 1
4 | 4 | 4
5 | 5 | 5
6 | | 5
7 | | 5
8 | | 5
9 | 9 | 9
10 | | 9
(11 rows)
I'd find this feature really useful, so I hope you can help me get my patch
to a contributable state.
Thanks -
Nick
Attachments:
lead-lag-ignore-nulls.patchapplication/octet-stream; name=lead-lag-ignore-nulls.patchDownload+197-35
Nicholas White <n.j.white@gmail.com> writes:
The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
lag, [...]. This is not implemented in PostgreSQL
(http://www.postgresql.org/docs/devel/static/functions-window.html)
I've had a go at implementing this, and I've attached the resulting patch.
It's not finished yet, but I was hoping to find out if my solution is along
the right lines.
Since we're trying to get 9.3 to closure, this patch probably isn't
going to get much attention until the 9.4 development cycle starts
(in a couple of months, likely). In the meantime, please add it to
the next commitfest list so we remember to come back to it:
https://commitfest.postgresql.org/action/commitfest_view?id=18
One comment just from a quick eyeball look is that we really hate
adding new keywords that aren't UNRESERVED, because that risks
breaking existing applications. Please see if you can refactor the
grammar to make those new entries unreserved.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks - I've added it here:
https://commitfest.postgresql.org/action/patch_view?id=1096 .
I've also attached a revised version that makes IGNORE and RESPECT
UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).
Nick
On 23 March 2013 14:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Nicholas White <n.j.white@gmail.com> writes:
The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
lag, [...]. This is not implemented in PostgreSQL
(http://www.postgresql.org/docs/devel/static/functions-window.html)
I've had a go at implementing this, and I've attached the resultingpatch.
It's not finished yet, but I was hoping to find out if my solution is
along
the right lines.
Since we're trying to get 9.3 to closure, this patch probably isn't
going to get much attention until the 9.4 development cycle starts
(in a couple of months, likely). In the meantime, please add it to
the next commitfest list so we remember to come back to it:
https://commitfest.postgresql.org/action/commitfest_view?id=18One comment just from a quick eyeball look is that we really hate
adding new keywords that aren't UNRESERVED, because that risks
breaking existing applications. Please see if you can refactor the
grammar to make those new entries unreserved.regards, tom lane
Attachments:
lead-lag-ignore-nulls.patchapplication/octet-stream; name=lead-lag-ignore-nulls.patchDownload+274-32
On Sat, Mar 23, 2013 at 3:25 PM, Nicholas White <n.j.white@gmail.com> wrote:
Thanks - I've added it here:
https://commitfest.postgresql.org/action/patch_view?id=1096 .I've also attached a revised version that makes IGNORE and RESPECT
UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).
Hm, you made another lookahead in base_yylex to make them unreserved --
looks ok, but not sure if there was no way to do it.
You might want to try byref types such as text. It seems you need to copy
the datum to save the value in appropriate memory context. Also, try to
create a view on those expressions. I don't think it correctly preserves
it.
Thanks,
--
Hitoshi Harada
Thanks for the feedback.
For the parsing changes, it seems I can either make RESPECT and IGNORE
reserved keywords, or add a lookahead to construct synthetic RESPECT NULLS
and IGNORE NULLS keywords. The grammar wouldn't compile if RESPECT and
IGNORE were just normal unreserved keywords due to ambiguities after a
function definition (e.g. select abs(1) respect; - which is currently a
valid statement).
I've redone the leadlag function changes to use datumCopy as you suggested.
However, I've had to remove the NOT_USED #ifdef around datumFree so I can
use it to avoid building up large numbers of copies of Datums in the memory
context while a query is executing. I've attached the revised patch...
Thanks -
Nick
On 24 March 2013 03:43, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
Show quoted text
On Sat, Mar 23, 2013 at 3:25 PM, Nicholas White <n.j.white@gmail.com>wrote:
Thanks - I've added it here:
https://commitfest.postgresql.org/action/patch_view?id=1096 .I've also attached a revised version that makes IGNORE and RESPECT
UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).Hm, you made another lookahead in base_yylex to make them unreserved --
looks ok, but not sure if there was no way to do it.You might want to try byref types such as text. It seems you need to copy
the datum to save the value in appropriate memory context. Also, try to
create a view on those expressions. I don't think it correctly preserves
it.Thanks,
--
Hitoshi Harada
Attachments:
lead-lag-ignore-nulls.patchapplication/octet-stream; name=lead-lag-ignore-nulls.patchDownload+354-28
On Sun, 2013-03-24 at 20:15 -0400, Nicholas White wrote:
I've redone the leadlag function changes to use datumCopy as you
suggested. However, I've had to remove the NOT_USED #ifdef around
datumFree so I can use it to avoid building up large numbers of copies
of Datums in the memory context while a query is executing. I've
attached the revised patch...
Comments:
WinGetResultDatumCopy() calls datumCopy, which will just copy in the
current memory context. I think you want it in the per-partition memory
context, otherwise the last value in each partition will stick around
until the query is done (so many partitions could be a problem). That
should be easy enough to do by switching to the
winobj->winstate->partcontext memory context before calling datumCopy,
and then switching back.
For that matter, why store the datum again at all? You can just store
the offset of the last non-NULL value in that partition, and then fetch
it using WinGetFuncArgInPartition(), right? We really want to avoid any
per-tuple allocations.
Alternatively, you might look into setting a mark when you get a
non-NULL value. Then you could just always fetch the oldest one.
Unfortunately, I think that only works with const_offset=true... so the
previous idea might be better.
I'll leave it to someone else to review the grammar changes.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Nicholas White escribi�:
For the parsing changes, it seems I can either make RESPECT and IGNORE
reserved keywords, or add a lookahead to construct synthetic RESPECT NULLS
and IGNORE NULLS keywords. The grammar wouldn't compile if RESPECT and
IGNORE were just normal unreserved keywords due to ambiguities after a
function definition (e.g. select abs(1) respect; - which is currently a
valid statement).
Well, making them reserved keywords is not that great, so maybe the
lookahead thingy is better. However, this patch introduces the third
and fourth uses of the "save the lookahead token" pattern in the
"default" switch cases. Can we refactor that bit somehow, to avoid so
many duplicates? (For a minute I thought that Andrew Gierth's WITH
ORDINALITY patch would add another one, but it seems not.)
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jun 15, 2013 at 9:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Nicholas White escribió:
For the parsing changes, it seems I can either make RESPECT and IGNORE
reserved keywords, or add a lookahead to construct synthetic RESPECT NULLS
and IGNORE NULLS keywords. The grammar wouldn't compile if RESPECT and
IGNORE were just normal unreserved keywords due to ambiguities after a
function definition (e.g. select abs(1) respect; - which is currently a
valid statement).Well, making them reserved keywords is not that great, so maybe the
lookahead thingy is better. However, this patch introduces the third
and fourth uses of the "save the lookahead token" pattern in the
"default" switch cases. Can we refactor that bit somehow, to avoid so
many duplicates? (For a minute I thought that Andrew Gierth's WITH
ORDINALITY patch would add another one, but it seems not.)
Making things reserved keywords is painful and I don't like it, but
I've started to become skeptical of shifting the problem to the lexer,
too. Sometimes special case logic in the lexer about token combining
can have surprising consequences in other parts of the grammar. For
example, with a lexer hack, what will happen if someone has a column
named RESPECT and does SELECT ... ORDER BY respect NULLS LAST? I
haven't studied the code in detail so maybe it's fine, but it's
something to think about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jun 15, 2013 at 1:30 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Sun, 2013-03-24 at 20:15 -0400, Nicholas White wrote:
I've redone the leadlag function changes to use datumCopy as you
suggested. However, I've had to remove the NOT_USED #ifdef around
datumFree so I can use it to avoid building up large numbers of copies
of Datums in the memory context while a query is executing. I've
attached the revised patch...Comments:
WinGetResultDatumCopy() calls datumCopy, which will just copy in the
current memory context. I think you want it in the per-partition memory
context, otherwise the last value in each partition will stick around
until the query is done (so many partitions could be a problem). That
should be easy enough to do by switching to the
winobj->winstate->partcontext memory context before calling datumCopy,
and then switching back.For that matter, why store the datum again at all? You can just store
the offset of the last non-NULL value in that partition, and then fetch
it using WinGetFuncArgInPartition(), right? We really want to avoid any
per-tuple allocations.
I believe WinGetFuncArgInPartition is a bit slow if the offset is far from
the current row. So it might make sense to store the last-seen value, but
I'm not sure if we need to copy datum every time. I haven't looked into
the new patch.
Thanks,
--
Hitoshi Harada
Thanks for the reviews. I've attached a revised patch that has the lexer
refactoring Alvaro mentions (arbitarily using a goto rather than a bool
flag) and uses Jeff's idea of just storing the index of the last non-null
value (if there is one) in the window function's context (and not the Datum
itself).
However, Robert's right that SELECT ... ORDER BY respect NULLS LAST will
now fail. An earlier iteration of the patch had RESPECT and IGNORE as
reserved, but that would have broken tables with columns called "respect"
(etc.), which the current version allows. Is this backwards incompatibility
acceptable? If not, maybe I should try doing a two-token lookahead in the
token-aggregating code between the lexer and the parser (i.e. make a
RESPECT_NULLS token out of a sequence of RESPECT NULLS OVER tokens,
remembering to replace the OVER token)? Or what about adding an %expect
statement to the Bison grammar, confirming that the shift / reduce
conflicts caused by using the RESPECT, IGNORE & NULL_P tokens the in
out_clause rule are OK?
Thanks -
Nick
Attachments:
lead-lag-ignore-nulls.patchapplication/octet-stream; name=lead-lag-ignore-nulls.patchDownload+332-61
On Tue, Jun 18, 2013 at 6:27 PM, Nicholas White <n.j.white@gmail.com> wrote:
Thanks for the reviews. I've attached a revised patch that has the lexer
refactoring Alvaro mentions (arbitarily using a goto rather than a bool
flag) and uses Jeff's idea of just storing the index of the last non-null
value (if there is one) in the window function's context (and not the Datum
itself).However, Robert's right that SELECT ... ORDER BY respect NULLS LAST will now
fail. An earlier iteration of the patch had RESPECT and IGNORE as reserved,
but that would have broken tables with columns called "respect" (etc.),
which the current version allows. Is this backwards incompatibility
acceptable?
I think it's better to add new partially reserved keywords than to
have this kind of random breakage. When you make something a
partially-reserved keyword, then things break, but in a fairly
well-defined way. Lexer hacks can break things in ways that are much
subtler, which we may not even realize for a long time, and which in
that case would mean that the word "respect" needs to be quoted in
some contexts but not others. That's going to cause a lot of
headaches, because pg_dump etc. know about quoting reserved keywords,
but they don't know anything about quoting unreserved keywords in
contexts where they might happen to be followed by the wrong next
word.
If not, maybe I should try doing a two-token lookahead in the
token-aggregating code between the lexer and the parser (i.e. make a
RESPECT_NULLS token out of a sequence of RESPECT NULLS OVER tokens,
remembering to replace the OVER token)? Or what about adding an %expect
statement to the Bison grammar, confirming that the shift / reduce conflicts
caused by using the RESPECT, IGNORE & NULL_P tokens the in out_clause rule
are OK?
These lines of inquiry don't seem promising to me. It's going to be
complicated and unmaintainable and may just move the failure scenarios
to cases that are too obscure for us to reason about.
I think the question is whether this feature is really worth adding
new reserved keywords for. I have a hard time saying we shouldn't
support something that's part of the SQL standard, but personally,
it's not something I've seen come up prior to this thread.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, 2013-06-20 at 10:03 -0400, Robert Haas wrote:
I think the question is whether this feature is really worth adding
new reserved keywords for. I have a hard time saying we shouldn't
support something that's part of the SQL standard, but personally,
it's not something I've seen come up prior to this thread.
What's the next step here?
The feature sounds useful to me. If the grammar is unacceptable, does
someone have an alternative idea, like using new function names instead
of grammar? If so, what are reasonable names to use?
Also, I think someone mentioned this already, but what about
first_value() and last_value()? Shouldn't we do those at the same time?
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 12:18 AM, Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2013-06-20 at 10:03 -0400, Robert Haas wrote:
I think the question is whether this feature is really worth adding
new reserved keywords for. I have a hard time saying we shouldn't
support something that's part of the SQL standard, but personally,
it's not something I've seen come up prior to this thread.What's the next step here?
Well, ideally, some other people weigh in on the value of the feature
vs. the pain of reserving the keywords.
The feature sounds useful to me.
...and there's one person with an opinion now! :-)
The other question here is - do we actually have the grammar right?
As in, is this actually the syntax we're supposed to be implementing?
It looks different from what's shown here, where the IGNORE NULLS is
inside the function's parentheses, rather than afterwards:
http://rwijk.blogspot.com/2010/06/simulating-laglead-ignore-nulls.html
IBM seems to think it's legal either inside or outside the parentheses:
Regardless of what syntax we settle on, we should also make sure that
the conflict is intrinsic to the grammar and can't be factored out, as
Tom suggested upthread. It's not obvious to me what the actual
ambiguity is here. If you've seen "select lag(num,0)" and the
lookahead token is "respect", what's the problem? It sort of looks
like it could be a column label, but not even unreserved keywords can
be column labels, so that's not it. Probably deserves a bit more
investigation...
If the grammar is unacceptable, does
someone have an alternative idea, like using new function names instead
of grammar? If so, what are reasonable names to use?
We could just add additional, optional Boolean argument to the
existing functions. It's non-standard, but we avoid adding keywords.
Also, I think someone mentioned this already, but what about
first_value() and last_value()? Shouldn't we do those at the same time?
Not sure.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, 2013-06-21 at 09:21 -0400, Robert Haas wrote:
The other question here is - do we actually have the grammar right?
As in, is this actually the syntax we're supposed to be implementing?
It looks different from what's shown here, where the IGNORE NULLS is
inside the function's parentheses, rather than afterwards:http://rwijk.blogspot.com/2010/06/simulating-laglead-ignore-nulls.html
IBM seems to think it's legal either inside or outside the parentheses:
The spec seems pretty clear that it falls outside of the parentheses,
unless I am missing something.
Regardless of what syntax we settle on, we should also make sure that
the conflict is intrinsic to the grammar and can't be factored out, as
Tom suggested upthread. It's not obvious to me what the actual
ambiguity is here. If you've seen "select lag(num,0)" and the
lookahead token is "respect", what's the problem? It sort of looks
like it could be a column label, but not even unreserved keywords can
be column labels, so that's not it. Probably deserves a bit more
investigation...
I think the problem is when the function is used as a table function;
e.g.:
SELECT * FROM generate_series(1,10) respect;
We could just add additional, optional Boolean argument to the
existing functions. It's non-standard, but we avoid adding keywords.
I thought about that, but it is awkward because the argument would have
to be constant (or, if not, it would be quite strange).
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis <pgsql@j-davis.com> wrote:
Regardless of what syntax we settle on, we should also make sure that
the conflict is intrinsic to the grammar and can't be factored out, as
Tom suggested upthread. It's not obvious to me what the actual
ambiguity is here. If you've seen "select lag(num,0)" and the
lookahead token is "respect", what's the problem? It sort of looks
like it could be a column label, but not even unreserved keywords can
be column labels, so that's not it. Probably deserves a bit more
investigation...I think the problem is when the function is used as a table function;
e.g.:SELECT * FROM generate_series(1,10) respect;
Ah ha. Well, there's probably not much help for that. Disallowing
keywords as table aliases would be a cure worse than the disease, I
think. I suppose the good news is that there probably aren't many
people using RESPECT as a column name, but I have a feeling that we're
almost certain to get complaints about reserving IGNORE. I think that
will have to be quoted as a PL/pgsql variable name as well. :-(
We could just add additional, optional Boolean argument to the
existing functions. It's non-standard, but we avoid adding keywords.I thought about that, but it is awkward because the argument would have
to be constant (or, if not, it would be quite strange).
True... but e.g. string_agg() has the same issue.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello all
I've been examining PostgreSQL to gain a greater understanding
of RDBMS. (Thanks for a nice, very educational system!)
In the process I've been looking into a few problems and the
complications of this patch appeared relatively uninvolved, so I
tried to look for a solution.
I found the following:
The grammar conflict appears to be because of ambiguities in:
1. table_ref (used exclusively in FROM clauses)
2. index_elem (used exclusively in INDEX creation statements).
Now, this doesn't seem to make much sense, as AFAICT window functions
are explicitly disallowed in these contexts (transformWindowFuncCall
will yield errors, and I can't really wrap my head around what a
window function call would mean there).
I therefore propose a simple rearrangement of the grammar,
syntactically disallowing window functions in the outer part of those
contexts (a_expr's inside can't and shouldn't be done much about)
which will allow both RESPECT and IGNORE to become unreserved
keywords, without doing any lexer hacking or abusing the grammar.
I've attached a patch which will add RESPECT NULLS and IGNORE NULLS to
the grammar in the right place. Also the window frame options are set
but nothing more, so this patch needs to be merged with Nicholas White's
original patch.
One problem I see with this approach to the grammar is that the
error messages will change when putting window functions in any of the
forbidden places. The new error messages are I think worse and less
specific than the old ones. I suppose that can be fixed though, or
maybe the problem isn't so severe.
Example of old error message:
window functions are not allowed in functions in FROM
New error message:
syntax error at or near "OVER"
in addition I think the original patch as it stands has a few
problems that I haven't seen discussed:
1. The result of the current patch using lead
create table test_table (
id serial,
val integer);
insert into test_table (val) select * from unnest(ARRAY[1,2,3,4,NULL, NULL,
NULL, 5, 6, 7]);
select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 | 7
7 | 7
(10 rows)
I would expect it to output:
select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 5
4 | 6
| 6
| 6
| 6
5 | 7
6 |
7 |
(10 rows)
That is: skip two rows forward not counting null rows.
The lag behavior works well as far as I can see.
2. It would be nice if an error was given when ignore nulls was used
on a function for which it had no effect. Perhaps this should be up to
the different window function themselves to do though.
Apart from those points I think the original patch is nice and provides a
functionality
that's definitely nice to have.
Kind Regards
Troels Nielsen
On Fri, Jun 21, 2013 at 8:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis <pgsql@j-davis.com> wrote:
Regardless of what syntax we settle on, we should also make sure that
the conflict is intrinsic to the grammar and can't be factored out, as
Tom suggested upthread. It's not obvious to me what the actual
ambiguity is here. If you've seen "select lag(num,0)" and the
lookahead token is "respect", what's the problem? It sort of looks
like it could be a column label, but not even unreserved keywords can
be column labels, so that's not it. Probably deserves a bit more
investigation...I think the problem is when the function is used as a table function;
e.g.:SELECT * FROM generate_series(1,10) respect;
Ah ha. Well, there's probably not much help for that. Disallowing
keywords as table aliases would be a cure worse than the disease, I
think. I suppose the good news is that there probably aren't many
people using RESPECT as a column name, but I have a feeling that we're
almost certain to get complaints about reserving IGNORE. I think that
will have to be quoted as a PL/pgsql variable name as well. :-(We could just add additional, optional Boolean argument to the
existing functions. It's non-standard, but we avoid adding keywords.I thought about that, but it is awkward because the argument would have
to be constant (or, if not, it would be quite strange).True... but e.g. string_agg() has the same issue.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
respect_nulls_and_ignore_nulls_grammar.patchapplication/octet-stream; name=respect_nulls_and_ignore_nulls_grammar.patchDownload+83-39
Good catch - I've attached a patch to address your point 1. It now returns
the below (i.e. correctly doesn't fill in the saved value if the index is
out of the window. However, I'm not sure whether (e.g.) lead-2-ignore-nulls
means count forwards two rows, and if that's null use the last one you've
seen (the current implementation) or count forwards two non-null rows (as
you suggest). The behaviour isn't specified in a (free) draft of the 2003
standard (http://www.wiscorp.com/sql_2003_standard.zip), and I don't have
access to the (non-free) final version. Could someone who does have access
to it clarify this? I've also added your example to the regression test
cases.
select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 |
7 |
(10 rows)
If the other reviewers are happy with your grammar changes then I'll merge
them into the patch. Alternatively, if departing from the standard is OK
then we could reorder the keywords so that a window function is like SELECT
lag(x,1) OVER RESPECT NULLS (ORDER BY y) - i.e. putting the respect /
ignore tokens after the OVER reserved keyword. Although non-standard it'd
make the grammar change trivial.
Also, I think someone mentioned this already, but what about
first_value() and last_value()? Shouldn't we do those at the same time?
I didn't include this functionality for the first / last value window
functions as their implementation is currently a bit different; they just
call WinGetFuncArgInFrame to pick out a single value. Making these
functions respect nulls would involve changing the single lookup to a walk
through the tuples to find the first non-null version, and keeping track of
this index in a struct in the context. As this change is reasonably
orthogonal I was going to submit it as a separate patch.
Thanks -
Attachments:
lead-lag-ignore-nulls.patchapplication/octet-stream; name=lead-lag-ignore-nulls.patchDownload+381-68
On Fri, Jun 21, 2013 at 6:29 PM, Troels Nielsen <bn.troels@gmail.com> wrote:
The grammar conflict appears to be because of ambiguities in:
1. table_ref (used exclusively in FROM clauses)
2. index_elem (used exclusively in INDEX creation statements).Now, this doesn't seem to make much sense, as AFAICT window functions
are explicitly disallowed in these contexts (transformWindowFuncCall
will yield errors, and I can't really wrap my head around what a
window function call would mean there).I therefore propose a simple rearrangement of the grammar,
syntactically disallowing window functions in the outer part of those
contexts (a_expr's inside can't and shouldn't be done much about)
which will allow both RESPECT and IGNORE to become unreserved
keywords, without doing any lexer hacking or abusing the grammar.
I reviewed this today and I think this is a very nice approach.
Thanks for working on it!
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
OK - I've attached another iteration of the patch with Troels' grammar
changes. I think the only issue remaining is what the standard says about
lead semantics. Thanks -
Attachments:
lead-lag-ignore-nulls.patchapplication/octet-stream; name=lead-lag-ignore-nulls.patchDownload+351-70
The result of the current patch using lead
Actually, I think I agree with you (and, FWIW, so does Oracle:
http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#autoId18).
I've refactored the window function's implementation so that (e.g.) lead(5)
means the 5th non-null value away in front of the current row (the previous
implementation was the last non-null value returned if the 5th rows in
front was null). These semantics are slower, as the require the function to
scan through the tuples discarding non-null ones. I've made the
implementation use a bitmap in the partition context to cache whether or
not a given tuple produces a null. This seems correct (it passes the
regression tests) but as it stores row offsets (which are int64s) I was
careful not to use bitmap methods that use ints to refer to set members.
I've added more explanation in the code's comments. Thanks -