Add RANGE with values and exclusions clauses to the Window Functions
Adds RANGE BETWEEN with a start and end value, as well as an
exclusions clause, to the window functions. This partially resolves
TODO list item "Implement full support for window framing clauses".
== Specification ==
The window functions already allow a "ROWS BETWEEN start_value
PRECEDING/FOLLOWING AND end_value PRECEDING/FOLLOWING" to restrict the
number of rows within a partition that are piped into an aggregate
function based on their position before or after the current row. This
patch adds an equivalent for RANGE which restricts the rows based on
whether the _values_ of the ORDER BY column for all other rows in the
partition are within the start_value and end_value bounds. This brings
PostgreSQL to parity with Oracle, and implements a SQL:2011 standard
feature.
SQL:2011 also defines a window frame exclusion clause, which excludes
certain rows from the result. This clause doesn't seem to be
implemented in any mainstream RDBMS (MariaDb mentions that fact in its
documentation here:
https://mariadb.com/kb/en/library/window-functions-overview/ and has
it on its TODO list). This patch implements three EXCLUDE clauses
described in the standard:
EXCLUDE CURRENT ROW - excludes the current row from the result
EXCLUDE TIES - excludes identical rows from the result
EXCLUDE NO OTHERS - does nothing, is the default behavior; exists
purely to describe the intention not to exclude any other rows
The RANGE BETWEEN clause requires a single ORDER BY column which must
be either an integer or a date/time type. If the column is a date/time
type then start_value and end_value must both be an interval type. If
the column is an integer, then the values must both be integers.
== Testing ==
Tested on Windows with MinGW. All existing regression tests pass. New
tests and updated documentation is included. Tests show both the new
RANGE with values working and the exclusion clause working in both
RANGE and ROWS mode.
== Future Work ==
The standard also defines, in addition to RANGE and ROWS, a GROUPS
option with a corresponding EXCLUDE GROUP option. This also doesn't
seem to be implemented anywhere else, and I plan to implement it next.
This patch also adds some new error messages which have not been
internationalized.
Attachments:
0001-window-frame-v1.patchapplication/octet-stream; name=0001-window-frame-v1.patchDownload+1191-118
On 2017-11-24 15:11, Oliver Ford wrote:
Adds RANGE BETWEEN with a start and end value, as well as an
exclusions clause, to the window functions. This partially resolves
TODO list item "Implement full support for window framing clauses".
[0001-window-frame-v1.patch]
(debian 8)
make check fails:
foreign_data ... ok
window ... FAILED
xmlmap ... ok
The diff is:
$ ( cd
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress && cat
regression.diffs )
***
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/expected/window.out
2017-11-24 15:36:15.387573714 +0100
---
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/results/window.out
2017-11-24 15:38:35.290553157 +0100
***************
*** 1034,1043 ****
(10 rows)
SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
! ----------------------------------------------
! SELECT i.i, +
! sum(i.i) OVER (ORDER BY i.i) AS sum_rows+
FROM generate_series(1, 10) i(i);
(1 row)
--- 1034,1043 ----
(10 rows)
SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
!
---------------------------------------------------------------------------------------
! SELECT i.i,
+
! sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1
FOLLOWING) AS sum_rows+
FROM generate_series(1, 10) i(i);
(1 row)
This small hickup didn't prevent building an instance but obviously I
haven't done any real tests yet.
thanks,
Erik Rijkers
On Fri, Nov 24, 2017 at 3:08 PM, Erikjan Rijkers <er@xs4all.nl> wrote:
(debian 8)
make check fails:
foreign_data ... ok
window ... FAILED
xmlmap ... okThe diff is:
$ ( cd /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress &&
cat regression.diffs )
***
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/expected/window.out
2017-11-24 15:36:15.387573714 +0100
---
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/results/window.out
2017-11-24 15:38:35.290553157 +0100
***************
*** 1034,1043 ****
(10 rows)SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
! ----------------------------------------------
! SELECT i.i, +
! sum(i.i) OVER (ORDER BY i.i) AS sum_rows+
FROM generate_series(1, 10) i(i);
(1 row)--- 1034,1043 ---- (10 rows)SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef ! --------------------------------------------------------------------------------------- ! SELECT i.i, + ! sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ FROM generate_series(1, 10) i(i); (1 row)This small hickup didn't prevent building an instance but obviously I
haven't done any real tests yet.
I think something was committed recently that changed the spacing of
pg_get_viewdef on Windows. I had the same spacing as you until I
pulled this morning, so I updated my expected output but now it looks
like whatever's changed on Windows hasn't changed on Linux.....
I'll try and find what caused this change.
On 24 November 2017 at 22:11, Oliver Ford <ojford@gmail.com> wrote:
Adds RANGE BETWEEN with a start and end value, as well as an
exclusions clause, to the window functions. This partially resolves
TODO list item "Implement full support for window framing clauses".
Yay!
I'll try to take a look at this.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 24, 2017 at 3:08 PM, Erikjan Rijkers <er@xs4all.nl> wrote:
SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
! ----------------------------------------------
! SELECT i.i, +
! sum(i.i) OVER (ORDER BY i.i) AS sum_rows+
FROM generate_series(1, 10) i(i);
(1 row)--- 1034,1043 ---- (10 rows)SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef ! --------------------------------------------------------------------------------------- ! SELECT i.i, + ! sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ FROM generate_series(1, 10) i(i); (1 row)This small hickup didn't prevent building an instance but obviously I
haven't done any real tests yet.thanks,
Erik Rijkers
After another clone and rebuild it works alright with the correct
spacing on mine, so the attached v2 should all pass. I noticed that I
hadn't added the exclusions clauses to the view defs code, so that's
also in this patch with extra tests to cover it.
Attachments:
0001-window-frame-v2.patchapplication/octet-stream; name=0001-window-frame-v2.patchDownload+1298-116
On Mon, Nov 27, 2017 at 12:06 PM, Oliver Ford <ojford@gmail.com> wrote:
On Fri, Nov 24, 2017 at 3:08 PM, Erikjan Rijkers <er@xs4all.nl> wrote:
SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
! ----------------------------------------------
! SELECT i.i, +
! sum(i.i) OVER (ORDER BY i.i) AS sum_rows+
FROM generate_series(1, 10) i(i);
(1 row)--- 1034,1043 ---- (10 rows)SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef ! --------------------------------------------------------------------------------------- ! SELECT i.i, + ! sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ FROM generate_series(1, 10) i(i); (1 row)This small hickup didn't prevent building an instance but obviously I
haven't done any real tests yet.thanks,
Erik Rijkers
After another clone and rebuild it works alright with the correct
spacing on mine, so the attached v2 should all pass. I noticed that I
hadn't added the exclusions clauses to the view defs code, so that's
also in this patch with extra tests to cover it.
Sorry previous patch was in full-commit form and not just a diff.
Attached is it in bare diff form.
Attachments:
0001-window-frame-v3.patchapplication/octet-stream; name=0001-window-frame-v3.patchDownload+1297-115
On 2017-11-27 16:01, Oliver Ford wrote:
Attached is it in bare diff form.
[0001-window-frame-v3.patch]
Thanks, that did indeed fix it:
make && make check now ok.
There were errors in the doc build (unmatched tags); I fixed them in the
attached doc-patch (which should go on top of yours).
(In very limited testing I did not find any problems yet)
thanks,
Erik Rijkers
On 2017-11-27 17:34, Erik Rijkers wrote:
On 2017-11-27 16:01, Oliver Ford wrote:
Attached is it in bare diff form.
[0001-window-frame-v3.patch]
Thanks, that did indeed fix it:
make && make check now ok.
There were errors in the doc build (unmatched tags); I fixed them in
the attached doc-patch (which should go on top of yours).
0001-window-frame-v3-fixtags.diff
now attached, I hope...
Attachments:
0001-window-frame-v3-fixtags.difftext/x-diff; name=0001-window-frame-v3-fixtags.diffDownload+16-16
On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkers <er@xs4all.nl> wrote:
On 2017-11-27 17:34, Erik Rijkers wrote:
On 2017-11-27 16:01, Oliver Ford wrote:
Attached is it in bare diff form.
[0001-window-frame-v3.patch]
Thanks, that did indeed fix it:
make && make check now ok.
There were errors in the doc build (unmatched tags); I fixed them in
the attached doc-patch (which should go on top of yours).0001-window-frame-v3-fixtags.diff
now attached, I hope...
Cheers here's v4 with the correct docs.
Attachments:
0001-window-frame-v4.patchapplication/octet-stream; name=0001-window-frame-v4.patchDownload+1297-115
On Mon, Nov 27, 2017 at 04:55:17PM +0000, Oliver Ford wrote:
On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkers <er@xs4all.nl> wrote:
On 2017-11-27 17:34, Erik Rijkers wrote:
On 2017-11-27 16:01, Oliver Ford wrote:
Attached is it in bare diff form.
[0001-window-frame-v3.patch]
Thanks, that did indeed fix it:
make && make check now ok.
There were errors in the doc build (unmatched tags); I fixed them in
the attached doc-patch (which should go on top of yours).0001-window-frame-v3-fixtags.diff
now attached, I hope...
Cheers here's v4 with the correct docs.
I've taken the liberty of adding float8, somewhat mechanically. Do
the docs need some change, assuming that addition is useful?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachments:
0001-window-frame-v5.patchtext/x-diff; charset=us-asciiDownload+1340-116
On Tue, Nov 28, 2017 at 4:38 AM, David Fetter <david@fetter.org> wrote:
On Mon, Nov 27, 2017 at 04:55:17PM +0000, Oliver Ford wrote:
On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkers <er@xs4all.nl> wrote:
On 2017-11-27 17:34, Erik Rijkers wrote:
On 2017-11-27 16:01, Oliver Ford wrote:
Attached is it in bare diff form.
[0001-window-frame-v3.patch]
Thanks, that did indeed fix it:
make && make check now ok.
There were errors in the doc build (unmatched tags); I fixed them in
the attached doc-patch (which should go on top of yours).0001-window-frame-v3-fixtags.diff
now attached, I hope...
Cheers here's v4 with the correct docs.
I've taken the liberty of adding float8, somewhat mechanically. Do
the docs need some change, assuming that addition is useful?Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
The SQL:2011 standard says that the range values should only be an
integer or interval. My understanding is therefore that the ORDER BY
columns should only be either an integer, with integer range values -
or a date/time, with interval range values.
I think if we go outside the standard we should leave it for another
patch and further discussion. But maybe others would prefer to add
support for more types even if they are non-standard?
On Tue, Nov 28, 2017 at 10:51:19AM +0000, Oliver Ford wrote:
On Tue, Nov 28, 2017 at 4:38 AM, David Fetter <david@fetter.org> wrote:
I've taken the liberty of adding float8, somewhat mechanically. Do
the docs need some change, assuming that addition is useful?Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778The SQL:2011 standard says that the range values should only be an
integer or interval. My understanding is therefore that the ORDER BY
columns should only be either an integer, with integer range values -
or a date/time, with interval range values.I think if we go outside the standard we should leave it for another
patch and further discussion. But maybe others would prefer to add
support for more types even if they are non-standard?
I confess I was thinking more in terms of the use cases I recall from
back when I was training to be a scientist than the restrictions the
standard imposed.
Hapoy to make this extension separate if that's the consensus.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2017-11-27 17:55, Oliver Ford wrote:
Cheers here's v4 with the correct docs.
This email just to link related thread "Add GROUPS option to the Window
Functions":
/messages/by-id/CAGMVOdtWkb9X7dUh7vjaCaiH34UGFg88unXYTEOub0Rk0swSXw@mail.gmail.com
On Tue, Dec 5, 2017 at 11:12 PM, Erik Rijkers <er@xs4all.nl> wrote:
On 2017-11-27 17:55, Oliver Ford wrote:
Cheers here's v4 with the correct docs.
This email just to link related thread "Add GROUPS option to the Window
Functions":/messages/by-id/CAGMVOdtWkb9X7dUh7vjaCaiH34UGFg88unXYTEOub0Rk0swSXw@mail.gmail.com
After further testing I found that some functions were returning
incorrect values with the new options. The attached patch calculates
the frame head and tail correctly, and in RANGE mode checks for the
sort key being null. The rule I've followed is that a null is out of
range of any other value, including another null.
I've merged the RANGE and GROUPS patches into one as these fixes touch
similar bits of code. So the attached patch will give full window
frame clause support.
Attachments:
0001-window-frame-v6.patchapplication/octet-stream; name=0001-window-frame-v6.patchDownload+2746-151
Oliver Ford <ojford@gmail.com> writes:
[ 0001-window-frame-v6.patch ]
Generally speaking, Postgres tries hard to be an extensible-datatype
system, going beyond the SQL standard's minimum requirements when
necessary to make it so. The reason that we don't already have RANGE
PRECEDING/FOLLOWING support is that nobody was satisfied with only
making it work for integers and datetimes. There was, as I recall, code
implementing more or less what you've got here in the original window
function submission, and we pulled it out before committing because of
that inadequacy. I don't think the fact that some years have gone by
means that we should forget about keeping the feature extensible.
One subsequent discussion about how we might make it work to project
standards was here:
/messages/by-id/51C3B952.60907@2ndquadrant.com
Looking back at that, I notice that we all focused on the way to identify
a suitable "+" or "-" operator, but now I'm thinking that that's not
actually a good factorization, because it'd be subject to undesirable
overflow hazards. That is, if we have an integer sequence like
2147483640
2147483641
2147483642
2147483643
2147483644
and we operate on this with "RANGE FOLLOWING 10", that approach results
in an integer overflow when we try to calculate the limit values. But
there's no real need for an overflow error. Ideally, if we try to form
2147483640 + 10 and notice it's overflowed, we'd treat the bound as
+infinity, because every non-overflowed integer value must be within range.
So the approach I'm imagining now is a datatype-specific support function
along the lines of
in_range(a, b, delta) returns bool
which is supposed to return true if a <= b + delta, or something along
that line --- exact details of the definition TBD --- with the proviso
that if b + delta would overflow then the result is automatically true.
We could probably also delegate the requirement of throwing an error
for negative delta to this function, eliminating the need for the
datatype-independent core code to know how to tell that, which is the
other datatype-dependent behavior needed per spec.
Likely there are two of these, one each for the PRECEDING and FOLLOWING
cases.
As suggested in the above-mentioned thread, we could attach such functions
as support functions in the btree opclass that defines the sort order of
the window frame's ordering column, and the core code could look it up
from there. Extensibility would come from the fact that people can define
new opclasses. Also, I believe we could support multiple such functions
per opclass, allowing the potential to support "delta"s of different
datatypes --- pg_amproc.amproclefttype would correspond to the common
type of a and b, while pg_amproc.amprocrighttype would correspond to
the data type of delta. We certainly need to allow delta to be a
different type from a/b just to handle the spec's timestamp cases.
I'm not sure if there's near-term value in multiple types of delta
values, but it seems easy to allow in this framework.
regards, tom lane
On 01/09/2018 10:59 PM, Tom Lane wrote:
Generally speaking, Postgres tries hard to be an extensible-datatype
system, going beyond the SQL standard's minimum requirements when
necessary to make it so. The reason that we don't already have RANGE
PRECEDING/FOLLOWING support is that nobody was satisfied with only
making it work for integers and datetimes. There was, as I recall, code
implementing more or less what you've got here in the original window
function submission, and we pulled it out before committing because of
that inadequacy. I don't think the fact that some years have gone by
means that we should forget about keeping the feature extensible.
I'm glad I read the thread before I replied. My biggest complaint I had
in my head when reading the initial post was that clamping down on
specific datatypes was distinctly non-PostgreSQL-esque.
I'm -1 on such a patch, even though I would really like this feature.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Vik Fearing <vik.fearing@2ndquadrant.com> writes:
I'm -1 on such a patch, even though I would really like this feature.
For the record, I'd really like to get this feature in too (and am
willing to help) ... but it needs to be done right.
regards, tom lane
On Tuesday, 9 January 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So the approach I'm imagining now is a datatype-specific support function
along the lines ofin_range(a, b, delta) returns bool
which is supposed to return true if a <= b + delta, or something along
that line --- exact details of the definition TBD --- with the proviso
that if b + delta would overflow then the result is automatically true.We could probably also delegate the requirement of throwing an error
for negative delta to this function, eliminating the need for the
datatype-independent core code to know how to tell that, which is the
other datatype-dependent behavior needed per spec.Likely there are two of these, one each for the PRECEDING and FOLLOWING
cases.
Would you prefer two functions, or a single function with a parameter for
PRECEDING/FOLLOWING? Maybe:
in_range(a, b, delta, following) returns bool
Where following is a bool which is true if FOLLOWING was specified and
false if PRECEDING was specified?
Oliver Ford <ojford@gmail.com> writes:
On Tuesday, 9 January 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So the approach I'm imagining now is a datatype-specific support function
along the lines of
in_range(a, b, delta) returns bool
Likely there are two of these, one each for the PRECEDING and FOLLOWING
cases.
Would you prefer two functions, or a single function with a parameter for
PRECEDING/FOLLOWING? Maybe:
in_range(a, b, delta, following) returns bool
You could do it that way too. The two-function approach seems a little
cleaner and easier to document IMO, but it would create more catalog
bloat, so there's that. I don't have a strong preference.
regards, tom lane
Attached patch implements an extensible version of the RANGE with
values clause. It doesn't actually add any more type support than was
available in previous versions, but is flexible enough for them to be
added easily in further commits.
The new code adds a new family, "in_range_ops", to pg_opfamily. This
can be thought of as a polymorphic function type that is implemented
by adding rows to pg_amproc with the corresponding amprocfamily. In
pg_amproc, a row of this family has an amproclefttype corresponding to
the ORDER BY column's type, an amprocrighttype corresponding to the
offset type, and an amproc with the pg_proc.Oid of the function that
implements "in_range" for these types. The implementing function has a
name in pg_proc of the format:
in_range_<<sort type>>_<<offset type>>_[asc|desc]
For example, "in_range_int8_int4_asc" is the function for a sort
column of type int8, an offset of type int4, and a sort mode of
ascending. In pg_amproc, an amprocnum of 1 is used for an ascending
sort mode, and an amprocnum of 2 for a descending sort mode. Treating
"in_range" as its own opfamily allows the retrieval of the function's
Oid with a single cache lookup. The Oid's for start and end are sent
through the parser and planner into nodeWindowAgg, where the
PGFunction is retrieved. The PGFunction is then called to check if the
row is in range. No other code in nodeWindowAgg has changed from
previous versions of the patch.
The in_range functions have the following signature:
in_range(Datum curr, Datum slot, Datum offset, bool preceding, bool end);
I've tested that the existing regression tests in previous versions
still pass, and also added new tests for descending mode.
I would suggest that we add any further type support as separate
patches on top of this as this patch is already quite large. I'm happy
to work on adding any types that people would like to be supported.