Re: Windowing Function Patch Review -> Standard Conformance
I wrote:
All,
This is my first patch review for PostgreSQL. I did submit a patch last
commit fest (Boyer-Moore) so I feel I should review one this commit fest.
I'm quite new to PostgreSQL so please don't rely on me totally. I'll do my
best. Heikki is also reviewing this patch which makes me feel better.My aim is to get the author has much feed back as quickly as possible.
For this reason I'm going to be breaking down my reviews into the
following topics.1. Does patch apply cleanly?
2. Any compiler warnings?
3. Do the results follow the SQL standard?
4. Performance Comparison, does it perform better than alternate ways of
doing things. Self joins, sub queries etc.5. Performance, no comparison. How does it perform with larger tables?
I regret starting the individual thread for each function now. I was
expecting them to get longer. So I'll use this one for the remainder of the
standard conformance tests.
I covered all of the non-aggregate functions in previous tests. I will do
more final tests on them soon. Tonight I started testing the aggregate
functions with NULLable columns and I've found a small problem. I'll just
post my test script to make it easy for you to see what I mean.
BEGIN WORK;
CREATE TABLE auction_items (
auctionid INT NOT NULL,
category VARCHAR(32) NOT NULL,
description VARCHAR(128) NOT NULL,
highestbid INT NULL, -- NULL when no bids
reserve INT NULL, -- NULL when no reserve
started TIMESTAMP NOT NULL, -- When the action opened
days INT NOT NULL, -- how many days the auction runs.
CHECK(days > 0),
PRIMARY KEY (auctionid)
);
INSERT INTO auction_items
VALUES(1,'BIKE','Broken Bicycle',NULL,100,NOW(),10);
INSERT INTO auction_items
VALUES(2,'BIKE','Mountain Bike',120,NULL,NOW(),10);
INSERT INTO auction_items
VALUES(3,'BIKE','Racer',230,NULL,NOW(),7);
INSERT INTO auction_items
VALUES(4,'CAR','Bmw M3',5000,6000,NOW(),7);
INSERT INTO auction_items
VALUES(5,'CAR','Audi S3',NULL,6500,NOW(),7);
INSERT INTO auction_items
VALUES(6,'CAR','Holden Kingswood',NULL,2000,NOW(),7);
COMMIT;
-- The following query gives incorrect results on the
-- maxhighbid column
SELECT auctionid,
category,
description,
highestbid,
reserve,
MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid,
MAX(reserve) OVER() AS highest_reserve
FROM auction_items;
If you remove the highest_reserve column you get the correct results.
The bug also affects MIN, AVG, COUNT, STDDEV but not SUM.
I've also had a little look at the documentation included in the patch.
At first scan the only thing I can see that is wrong is
+
+ <para>
+ Window functions are put in the <command>SELECT</command> list.
+ It is forbidden anywhere else such as <literal>GROUP BY</literal>,
I think this needs to be rewritten as they are allowed in the ORDER BY
clause, works in patch and spec says the same.
Maybe it should read something more like:
"Window functions are only permitted in the <command>SELECT</command> and
the <command>ORDER BY</command> clause of the query. They are forbidden
anywhere else..." etc.
I won't be around until Monday evening (Tuesday morning JST). I'll pickup
the review again there. It's really time for me to push on with this review.
The patch is getting closer to getting the thumbs up from me. I'm really
hoping that can happen on Monday. Then it's over to Heikki for more code
feedback.
Keep up the good work.
David.
Import Notes
Reply to msg id not found:
2008/11/20 David Rowley <dgrowley@gmail.com>:
-- The following query gives incorrect results on the
-- maxhighbid columnSELECT auctionid,
category,
description,
highestbid,
reserve,
MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid,
MAX(reserve) OVER() AS highest_reserve
FROM auction_items;If you remove the highest_reserve column you get the correct results.
The bug also affects MIN, AVG, COUNT, STDDEV but not SUM.
Good report! I fixed this bug, which was by a trival missuse of
list_concat() in the planner. This was occurred when the first
aggregate trans func is strict and the second aggregate argument may
be null. Yep, the argument of the second was implicitly passed to the
first wrongly. That's why it didn't occur if you delete the second
MAX().
I added a test case with the existing data emulating yours (named
"strict aggs") but if it is wrong, let me know.
I've also had a little look at the documentation included in the patch.
At first scan the only thing I can see that is wrong is+ + <para> + Window functions are put in the <command>SELECT</command> list. + It is forbidden anywhere else such as <literal>GROUP BY</literal>,I think this needs to be rewritten as they are allowed in the ORDER BY
clause, works in patch and spec says the same.Maybe it should read something more like:
"Window functions are only permitted in the <command>SELECT</command> and
the <command>ORDER BY</command> clause of the query. They are forbidden
anywhere else..." etc.
You're right. I modified this part, with <literal> tag instead of
<command>. I am not sure about the clear difference of <command> and
<literal> but followed what the surroundings tell.
I won't be around until Monday evening (Tuesday morning JST). I'll pickup
the review again there. It's really time for me to push on with this review.
The patch is getting closer to getting the thumbs up from me. I'm really
hoping that can happen on Monday. Then it's over to Heikki for more code
feedback.
This time I only fixed some bugs and rebased against the HEAD, but did
not refactored. I can figure out some part of what Heikki claimed but
not all, so I'd like to see what he will send and post another patch
if needed.
Regards,
--
Hitoshi Harada
Attachments:
window_functions.patch.20081124-1.gzapplication/x-gzip; name=window_functions.patch.20081124-1.gzDownload+0-2
patch-2
2008/11/24 Hitoshi Harada <umi.tanuki@gmail.com>:
2008/11/20 David Rowley <dgrowley@gmail.com>:
-- The following query gives incorrect results on the
-- maxhighbid columnSELECT auctionid,
category,
description,
highestbid,
reserve,
MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid,
MAX(reserve) OVER() AS highest_reserve
FROM auction_items;If you remove the highest_reserve column you get the correct results.
The bug also affects MIN, AVG, COUNT, STDDEV but not SUM.
Good report! I fixed this bug, which was by a trival missuse of
list_concat() in the planner. This was occurred when the first
aggregate trans func is strict and the second aggregate argument may
be null. Yep, the argument of the second was implicitly passed to the
first wrongly. That's why it didn't occur if you delete the second
MAX().I added a test case with the existing data emulating yours (named
"strict aggs") but if it is wrong, let me know.I've also had a little look at the documentation included in the patch.
At first scan the only thing I can see that is wrong is+ + <para> + Window functions are put in the <command>SELECT</command> list. + It is forbidden anywhere else such as <literal>GROUP BY</literal>,I think this needs to be rewritten as they are allowed in the ORDER BY
clause, works in patch and spec says the same.Maybe it should read something more like:
"Window functions are only permitted in the <command>SELECT</command> and
the <command>ORDER BY</command> clause of the query. They are forbidden
anywhere else..." etc.You're right. I modified this part, with <literal> tag instead of
<command>. I am not sure about the clear difference of <command> and
<literal> but followed what the surroundings tell.I won't be around until Monday evening (Tuesday morning JST). I'll pickup
the review again there. It's really time for me to push on with this review.
The patch is getting closer to getting the thumbs up from me. I'm really
hoping that can happen on Monday. Then it's over to Heikki for more code
feedback.This time I only fixed some bugs and rebased against the HEAD, but did
not refactored. I can figure out some part of what Heikki claimed but
not all, so I'd like to see what he will send and post another patch
if needed.Regards,
--
Hitoshi Harada
--
Hitoshi Harada
Attachments:
window_functions.patch.20081124-2.gzapplication/x-gzip; name=window_functions.patch.20081124-2.gzDownload+2-0
Hitoshi Harada wrote:
2008/11/20 David Rowley <dgrowley@gmail.com>:
I won't be around until Monday evening (Tuesday morning JST). I'll pickup
the review again there. It's really time for me to push on with this review.
The patch is getting closer to getting the thumbs up from me. I'm really
hoping that can happen on Monday. Then it's over to Heikki for more code
feedback.This time I only fixed some bugs and rebased against the HEAD, but did
not refactored. I can figure out some part of what Heikki claimed but
not all, so I'd like to see what he will send and post another patch
if needed.
Thanks! Here's what I've got this far I merged your latest patch into
this, as well as latest changes from CVS HEAD.
It's a bit of a mess, but here's the big changes I've done:
- Removed all the iterator stuff. You can just call
WinGetPart/FrameGetArg repeatedly. It ought to be just as fast if done
right in nodeWindow.c.
- Removed all the Shrinked/Extended stuff, as it's not needed until we
have full support for window frames.
- Removed all the WinRow* functions, you can just call WinFrame/PartGet*
functions, using WINDOW_SEEK_CURRENT
- Removed WinFrame/PartGetTuple functions. They were only used for
determining if two tuples are peer with each other, so I added a
WinRowIsPeer function instead.
- Removed all the buffering strategy stuff. Currently, the whole
partition is always materialized. That's of course not optimal; I'm
still thinking we should just read the tuples from the outer node
lazily, on-demand, instead. To avoid keeping all tuples in the partition
in tuplestore, when not needed, should add an extra function to trim the
tuplestore.
There's now a lot less code in nodeWindow.c, and I'm starting to
understand most of what-s left :-).
TODO
- clean up the comments and other mess.
- Modify WinPart/FrameGetArg so that it's passed the parameter number
instead of an Expr node.
I'll continue working on the executor, but please let me know what you
think.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote:
Heikki Linnakangas wrote:
Hitoshi Harada wrote:
This time I only fixed some bugs and rebased against the HEAD, but did
not refactored. I can figure out some part of what Heikki claimed but
not all, so I'd like to see what he will send and post another patch
if needed.Thanks! Here's what I've got this far I merged your latest patch into
this, as well as latest changes from CVS HEAD.You forgot an attachment it seems ... ?
Yes, I did. I noticed that right after clicking Send, and sent the patch
as a separate mail after that. But apparently it didn't reach the list.
I guess it was above the size limit.
Here's another try, this time compressed with bzip2, which is a lot tighter.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
windowfunc-heikki-1.patch.bz2application/x-bzip; name=windowfunc-heikki-1.patch.bz2Download+1-2
Import Notes
Reply to msg id not found: 20081124124022.GB3861@alvh.no-ip.org
2008/11/24 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Hitoshi Harada wrote:
2008/11/20 David Rowley <dgrowley@gmail.com>:
I won't be around until Monday evening (Tuesday morning JST). I'll pickup
the review again there. It's really time for me to push on with this
review.
The patch is getting closer to getting the thumbs up from me. I'm really
hoping that can happen on Monday. Then it's over to Heikki for more code
feedback.This time I only fixed some bugs and rebased against the HEAD, but did
not refactored. I can figure out some part of what Heikki claimed but
not all, so I'd like to see what he will send and post another patch
if needed.Thanks! Here's what I've got this far I merged your latest patch into this,
as well as latest changes from CVS HEAD.It's a bit of a mess, but here's the big changes I've done:
- Removed all the iterator stuff. You can just call WinGetPart/FrameGetArg
repeatedly. It ought to be just as fast if done right in nodeWindow.c.
- Removed all the Shrinked/Extended stuff, as it's not needed until we have
full support for window frames.
- Removed all the WinRow* functions, you can just call WinFrame/PartGet*
functions, using WINDOW_SEEK_CURRENT
- Removed WinFrame/PartGetTuple functions. They were only used for
determining if two tuples are peer with each other, so I added a
WinRowIsPeer function instead.
- Removed all the buffering strategy stuff. Currently, the whole partition
is always materialized. That's of course not optimal; I'm still thinking we
should just read the tuples from the outer node lazily, on-demand, instead.
To avoid keeping all tuples in the partition in tuplestore, when not needed,
should add an extra function to trim the tuplestore.There's now a lot less code in nodeWindow.c, and I'm starting to understand
most of what-s left :-).TODO
- clean up the comments and other mess.
- Modify WinPart/FrameGetArg so that it's passed the parameter number
instead of an Expr node.I'll continue working on the executor, but please let me know what you
think.
It is amazing changes and I like it. Actually I wanted to get it
slimer but hadn't found the way.
two points, frame concept and window_gettupleslot
If you keep this in your mind, please don't be annoyed but the current
frame concept is wrong.
-- yours
sample=# select depname, empno, salary, last_value(empno) over(order
by salary) from empsalary;
depname | empno | salary | last_value
-----------+-------+--------+------------
personnel | 5 | 3500 | 5
personnel | 2 | 3900 | 2
develop | 7 | 4200 | 7
develop | 9 | 4500 | 9
sales | 4 | 4800 | 4
sales | 3 | 4800 | 3
sales |b1 | 5000 | 1
develop | 11 | 5200 | 11
develop | 10 | 5200 | 10
develop | 8 | 6000 | 8
(10 rows)
-- mine
sample=# select depname, empno, salary, last_value(empno) over(order
by salary) from empsalary;
depname | empno | salary | last_value
-----------+-------+--------+------------
personnel | 5 | 3500 | 5
personnel | 2 | 3900 | 2
develop | 7 | 4200 | 7
develop | 9 | 4500 | 9
sales | 4 | 4800 | 3
sales | 3 | 4800 | 3
sales | 1 | 5000 | 1
develop | 11 | 5200 | 10
develop | 10 | 5200 | 10
develop | 8 | 6000 | 8
(10 rows)
Note that on empno=4 then last_value=4(yours)/3(mine), which means the
frame is applied to last_value() as well as nth_value() and
first_value(). Not only to aggregates. So these lines in nodeWindow.c,
/*
* If there is an ORDER BY (we don't support other window frame
* specifications yet), the frame runs from first row of the partition
* to the current row. Otherwise the frame is the whole partition.
*/
if (((Window *) winobj->winstate->ss.ps.plan)->ordNumCols == 0)
max_pos = winobj->partition_rows - 1;
else
max_pos = winobj->currentpos;
max_pos is bogus. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
means max_pos would be currentpos + rows_peers. I looked over the
patch and aggregates care it with winobj->aggregatedupto but what
about other window functions?
Then window_gettupleslot looks like killing performance in some cases.
What if the partition has millions of rows and wfunc1 seeks the head
and wfunc2 seeks the tail?
So as you point it is possible to define frame and partition while
scanning current rows rather than before scanning, I felt it'd cost
more. But I know this is work in progress, so you probably think about
the solutions. What do you think?
Regards,
--
Hitoshi Harada
2008/11/25 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Hitoshi Harada wrote:
If you keep this in your mind, please don't be annoyed but the current
frame concept is wrong....
Note that on empno=4 then last_value=4(yours)/3(mine), which means the
frame is applied to last_value() as well as nth_value() and
first_value(). Not only to aggregates. So these lines in nodeWindow.c,Yes, you're right.
Hmm, I wonder if we should implement first_value, last_value and nth_value
as regular aggregates. That way, all the window functions would operate on
the partition, not caring about the frame, and all the aggregates would
operate on the frame.
No way. Current specification that we don't get other frames than
with/without ORDER BY doesn't have conflict with your proposal.
However, thinking about the future, we decided to add window
aggregates with wfunc='t', with subfunc for shrinking frame
performance up. The direction we should head for is convert aggregates
to subset of window functions, not vice versa.
Then window_gettupleslot looks like killing performance in some cases.
What if the partition has millions of rows and wfunc1 seeks the head
and wfunc2 seeks the tail?Yeah, we probably should do something about that. You used several different
read pointers to the tuplestore in your patch, one for frame head, another
for frame tail, another for partition head etc., but I think that
potentially suffers from the same problem. Perhaps we should have one read
pointer for each window function, plus one reading the current row? It seems
likely that one window function is not going to hop around wildly, and the
behavior wouldn't depend so much on what other window functions are being
used.
Yes. My patch also has performance problem in seeking but is better
than one read pointer. So I like your proposal to attach a read
pointer with each wfuncs. I am not sure what kind of window functions
the user will define in the future, but for current use cases it comes
reasonable. Anyway, only one read pointer will make problems, I guess.
So as you point it is possible to define frame and partition while
scanning current rows rather than before scanning, I felt it'd cost
more. But I know this is work in progress, so you probably think about
the solutions. What do you think?Here's an updated patch, where the rows are fetched on-demand.
Good! And I like the fetching args by number better. Let me take more
time to look in detail...
Regards,
--
Hitoshi Harada
Import Notes
Reply to msg id not found: 492AEBB8.8030609@enterprisedb.com
Hitoshi Harada wrote:
2008/11/20 David Rowley <dgrowley@gmail.com>:
-- The following query gives incorrect results on the
-- maxhighbid columnSELECT auctionid,
category,
description,
highestbid,
reserve,
MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid,
MAX(reserve) OVER() AS highest_reserve
FROM auction_items;If you remove the highest_reserve column you get the correct results.
The bug also affects MIN, AVG, COUNT, STDDEV but not SUM.
Good report! I fixed this bug, which was by a trival missuse of
list_concat() in the planner. This was occurred when the first
aggregate trans func is strict and the second aggregate argument may
be null. Yep, the argument of the second was implicitly passed to the
first wrongly. That's why it didn't occur if you delete the second
MAX().I added a test case with the existing data emulating yours (named
"strict aggs") but if it is wrong, let me know.
It's not quite right yet. I'm also getting regression tests failing on
window. Let me know if you want the diffs.
I've created a query that uses the table in your regression test.
max_salary1 gives incorrect results. If you remove the max_salary2 column it
gives the correct results.
Please excuse the lack of sanity with the query. I had to do it this way to
get 2 columns with NULLs.
SELECT depname,
empno,
salary,
salary1,
salary2,
MAX(salary1) OVER (ORDER BY empno) AS max_salary1,
MAX(salary2) OVER() AS max_salary2
FROM (SELECT depname,
empno,
salary,
(CASE WHEN salary < 5000 THEN NULL ELSE salary END) AS salary1,
(CASE WHEN salary >= 5000 THEN NULL ELSE salary END) AS salary2
FROM empsalary
) empsalary;
Actual results:
depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2
-----------+-------+--------+---------+---------+-------------+-------------
sales | 1 | 5000 | 5000 | | | 4800
personnel | 2 | 3900 | | 3900 | | 4800
sales | 3 | 4800 | | 4800 | | 4800
sales | 4 | 4800 | | 4800 | | 4800
personnel | 5 | 3500 | | 3500 | | 4800
develop | 7 | 4200 | | 4200 | | 4800
develop | 8 | 6000 | 6000 | | | 4800
develop | 9 | 4500 | | 4500 | | 4800
develop | 10 | 5200 | 5200 | | | 4800
develop | 11 | 5200 | 5200 | | | 4800
Correct results:
depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2
-----------+-------+--------+---------+---------+-------------+-------------
sales | 1 | 5000 | 5000 | | 5000 | 4800
personnel | 2 | 3900 | | 3900 | 5000 | 4800
sales | 3 | 4800 | | 4800 | 5000 | 4800
sales | 4 | 4800 | | 4800 | 5000 | 4800
personnel | 5 | 3500 | | 3500 | 5000 | 4800
develop | 7 | 4200 | | 4200 | 5000 | 4800
develop | 8 | 6000 | 6000 | | 6000 | 4800
develop | 9 | 4500 | | 4500 | 6000 | 4800
develop | 10 | 5200 | 5200 | | 6000 | 4800
develop | 11 | 5200 | 5200 | | 6000 | 4800
This might be a good regression test once it's fixed.
I'm at a bit of a loss to what to do now. Should I wait for your work
Heikki? Or continue validating this patch?
The best thing I can think to do right now is continue and any problems I
find you can add regression tests for, then if we keep your regression tests
for Heikki's changes then we can validate those changes more quickly.
Any thoughts? Better ideas?
David.
2008/11/26 David Rowley <dgrowley@gmail.com>:
Hitoshi Harada wrote:
2008/11/20 David Rowley <dgrowley@gmail.com>:
-- The following query gives incorrect results on the
-- maxhighbid columnSELECT auctionid,
category,
description,
highestbid,
reserve,
MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid,
MAX(reserve) OVER() AS highest_reserve
FROM auction_items;If you remove the highest_reserve column you get the correct results.
The bug also affects MIN, AVG, COUNT, STDDEV but not SUM.
Good report! I fixed this bug, which was by a trival missuse of
list_concat() in the planner. This was occurred when the first
aggregate trans func is strict and the second aggregate argument may
be null. Yep, the argument of the second was implicitly passed to the
first wrongly. That's why it didn't occur if you delete the second
MAX().I added a test case with the existing data emulating yours (named
"strict aggs") but if it is wrong, let me know.It's not quite right yet. I'm also getting regression tests failing on
window. Let me know if you want the diffs.I've created a query that uses the table in your regression test.
max_salary1 gives incorrect results. If you remove the max_salary2 column it
gives the correct results.Please excuse the lack of sanity with the query. I had to do it this way to
get 2 columns with NULLs.SELECT depname,
empno,
salary,
salary1,
salary2,
MAX(salary1) OVER (ORDER BY empno) AS max_salary1,
MAX(salary2) OVER() AS max_salary2
FROM (SELECT depname,
empno,
salary,
(CASE WHEN salary < 5000 THEN NULL ELSE salary END) AS salary1,
(CASE WHEN salary >= 5000 THEN NULL ELSE salary END) AS salary2
FROM empsalary
) empsalary;Actual results:
depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2
-----------+-------+--------+---------+---------+-------------+-------------
sales | 1 | 5000 | 5000 | | | 4800
personnel | 2 | 3900 | | 3900 | | 4800
sales | 3 | 4800 | | 4800 | | 4800
sales | 4 | 4800 | | 4800 | | 4800
personnel | 5 | 3500 | | 3500 | | 4800
develop | 7 | 4200 | | 4200 | | 4800
develop | 8 | 6000 | 6000 | | | 4800
develop | 9 | 4500 | | 4500 | | 4800
develop | 10 | 5200 | 5200 | | | 4800
develop | 11 | 5200 | 5200 | | | 4800Correct results:
depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2
-----------+-------+--------+---------+---------+-------------+-------------
sales | 1 | 5000 | 5000 | | 5000 | 4800
personnel | 2 | 3900 | | 3900 | 5000 | 4800
sales | 3 | 4800 | | 4800 | 5000 | 4800
sales | 4 | 4800 | | 4800 | 5000 | 4800
personnel | 5 | 3500 | | 3500 | 5000 | 4800
develop | 7 | 4200 | | 4200 | 5000 | 4800
develop | 8 | 6000 | 6000 | | 6000 | 4800
develop | 9 | 4500 | | 4500 | 6000 | 4800
develop | 10 | 5200 | 5200 | | 6000 | 4800
develop | 11 | 5200 | 5200 | | 6000 | 4800This might be a good regression test once it's fixed.
Hmm, did you apply the latest patch correctly? My build can produce
right results, so I don't see why it isn't fixed. Make sure the lines
around 2420-2430 in planner.c like:
/*
* must copyObject() to avoid args concatenating with each other.
*/
pulled_exprs = list_concat(pulled_exprs, copyObject(wfunc->args));
where copyObject() is added.
I'm not sure if this is related, another bug is found:
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***************
*** 2246,2251 **** equal(void *a, void *b)
--- 2246,2252 ----
break;
case T_Aggref:
retval = _equalAggref(a, b);
+ break;
case T_WFunc:
retval = _equalWFunc(a, b);
break;
don't laugh at... could you try it and test again?
If whole the new patch is needed, tell me then will send it.
I'm at a bit of a loss to what to do now. Should I wait for your work
Heikki? Or continue validating this patch?The best thing I can think to do right now is continue and any problems I
find you can add regression tests for, then if we keep your regression tests
for Heikki's changes then we can validate those changes more quickly.Any thoughts? Better ideas?
Thanks to your great tests, we now know much more about specification
and where to fail easily, so continuing makes sense but it may be good
time to take a rest and wait for Heikki's patch completing.
Regards,
--
Hitoshi Harada
2008/11/25 Hitoshi Harada <umi.tanuki@gmail.com>:
2008/11/25 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Here's an updated patch, where the rows are fetched on-demand.
Good! And I like the fetching args by number better. Let me take more
time to look in detail...
I read more, and your spooling approach seems flexible for both now
and the furture. Looking at only current release, the frame with ORDER
BY is done by detecting peers in WinFrameGetArg() and add row number
of peers to winobj->currentpos. Actually if we have capability to
spool all rows we need on demand, the frame would be only a boundary
problem.
It seems to me that eval_windowaggregate() also should use frame APIs.
Only things we have to care is the shrinking frame, which is not
supported in this release. So I'd suggest winobj->aggregatedupto to be
removed. Is there objection?
Regards,
--
Hitoshi Harada
David Rowley wrote:
I've created a query that uses the table in your regression test.
max_salary1 gives incorrect results. If you remove the max_salary2 column it
gives the correct results.
Thanks. I saw this myself yesterday, while hacking on the patch. I
thought it was a bug I had introduced, but apparently it was there all
along. Anyway, fixed in the latest version I will send shortly.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hitoshi Harada wrote:
2008/11/26 David Rowley <dgrowley@gmail.com>:
I'm at a bit of a loss to what to do now. Should I wait for your work
Heikki? Or continue validating this patch?The best thing I can think to do right now is continue and any problems I
find you can add regression tests for, then if we keep your regression tests
for Heikki's changes then we can validate those changes more quickly.Any thoughts? Better ideas?
Thanks to your great tests, we now know much more about specification
and where to fail easily, so continuing makes sense but it may be good
time to take a rest and wait for Heikki's patch completing.
Here's another updated patch, including all your bug fixes.
There's two known issues:
- ranking functions still don't treat peer rows correctly.
- I commented out the "this function requires ORDER BY clause in the
window" test in rank_up, because a window function shouldn't be poking
into the WindowState struct like that. I wonder if it's really needed?
In section 7.11, the SQL2008 spec says "if WD has no window ordering
clause, then the window ordering is implementation-dependent, and *all
rows are peers*". The regression test now fails because of this, but the
current behavior actually seems correct to me.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
windowfunc-heikki-3.bz2application/x-bzip; name=windowfunc-heikki-3.bz2Download+2-3
2008/11/26 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Hitoshi Harada wrote:
2008/11/26 David Rowley <dgrowley@gmail.com>:
I'm at a bit of a loss to what to do now. Should I wait for your work
Heikki? Or continue validating this patch?The best thing I can think to do right now is continue and any problems I
find you can add regression tests for, then if we keep your regression
tests
for Heikki's changes then we can validate those changes more quickly.Any thoughts? Better ideas?
Thanks to your great tests, we now know much more about specification
and where to fail easily, so continuing makes sense but it may be good
time to take a rest and wait for Heikki's patch completing.Here's another updated patch, including all your bug fixes.
There's two known issues:
- ranking functions still don't treat peer rows correctly.- I commented out the "this function requires ORDER BY clause in the window"
test in rank_up, because a window function shouldn't be poking into the
WindowState struct like that. I wonder if it's really needed? In section
7.11, the SQL2008 spec says "if WD has no window ordering clause, then the
window ordering is implementation-dependent, and *all rows are peers*". The
regression test now fails because of this, but the current behavior actually
seems correct to me.
Yes, I was wrong. The reason I put the error in rank() without ORDER
BY is nothing but I didn't find it. It is actually a reasonable
specification, isn't it.
This is tiny thing, but "negative transition function" can be called
"inverse transition function"? I feel the latter is more readable.
Regards,
--
Hitoshi Harada
Hitoshi Harada wrote:
I read more, and your spooling approach seems flexible for both now
and the furture. Looking at only current release, the frame with ORDER
BY is done by detecting peers in WinFrameGetArg() and add row number
of peers to winobj->currentpos. Actually if we have capability to
spool all rows we need on demand, the frame would be only a boundary
problem.
Yeah, we could do that. I'm afraid it would be pretty slow, though, if
there's a lot of peers. That could probably be alleviated with some sort
of caching, though.
It seems to me that eval_windowaggregate() also should use frame APIs.
Only things we have to care is the shrinking frame, which is not
supported in this release. So I'd suggest winobj->aggregatedupto to be
removed. Is there objection?
Actually, I took a different approach in the latest patch. Window
aggregates now use a separate read pointer into the tuplestore. The
current row is also read using a separate read pointer in ExecWindow.
The aggregate and current row read pointers don't need random access,
which has the nice effect that if you only use window aggregates, not
window functions, the tuplestore doesn't need random access, and doesn't
need to spill to disk as long as the window frame fits in work_mem.
We should still figure out how to make it possible to trim the
tuplestore, when window functions that don't need random access are
used. Like ROW_NUMBER and RANK. Earlier, I thought we should add
function to the window object API to explicitly tell that tuples before
row X are no longer needed. But I'm now starting to wonder if we should
design the window object API more like the tuplestore API, with a read
pointer that you can advance forward or backward, and rewind. That would
probably map more naturally to the underlying tuplestore, and it seems
like it would be just as easy to use in all the existing window functions.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
2008/11/26 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Hitoshi Harada wrote:
I read more, and your spooling approach seems flexible for both now
and the furture. Looking at only current release, the frame with ORDER
BY is done by detecting peers in WinFrameGetArg() and add row number
of peers to winobj->currentpos. Actually if we have capability to
spool all rows we need on demand, the frame would be only a boundary
problem.Yeah, we could do that. I'm afraid it would be pretty slow, though, if
there's a lot of peers. That could probably be alleviated with some sort of
caching, though.It seems to me that eval_windowaggregate() also should use frame APIs.
Only things we have to care is the shrinking frame, which is not
supported in this release. So I'd suggest winobj->aggregatedupto to be
removed. Is there objection?Actually, I took a different approach in the latest patch. Window aggregates
now use a separate read pointer into the tuplestore. The current row is also
read using a separate read pointer in ExecWindow. The aggregate and current
row read pointers don't need random access, which has the nice effect that
if you only use window aggregates, not window functions, the tuplestore
doesn't need random access, and doesn't need to spill to disk as long as the
window frame fits in work_mem.We should still figure out how to make it possible to trim the tuplestore,
when window functions that don't need random access are used. Like
ROW_NUMBER and RANK. Earlier, I thought we should add function to the window
object API to explicitly tell that tuples before row X are no longer needed.
But I'm now starting to wonder if we should design the window object API
more like the tuplestore API, with a read pointer that you can advance
forward or backward, and rewind. That would probably map more naturally to
the underlying tuplestore, and it seems like it would be just as easy to use
in all the existing window functions.
Complete solution, at least for the current release. I now figure out
exactly what the tuplestore_trim does. So currentrow pointer doesn't
need go backward, neither does extending frame's aggregate pointer,
row_number, rank, etc. Cutting off frame's aggregate need random row,
so does lead, lag, etc. Even there were random access, it's very
flexible in triming and saving memory. Little concern is some
operations like WinRowIsPeer() doesn't know if the required row is
trimmed already, which isn't big problem in the existing functions.
Now you might think about sharing aggregate code like
initialize/advance/finalize. If you want I can refactor in nodeAgg.c
to be able sharing with nodeWindow.c, which wouldn't conflict with
your work.
Regards,
--
Hitoshi Harada
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Here's another updated patch, including all your bug fixes.
I did a very fast pass through this and have a few comments.
* Don't bother manually updating keywords.sgml. As stated therein, that
table is automatically generated. All you'll accomplish is to cause merge
problems. (The effort would be a lot better spent on the non-boilerplate
parts of the docs anyway.)
* I assume there's going to be some way to create user-defined window
functions?
* It seems fairly unlikely that you can get away with not supporting
any qual expression on a Window plan node. What will you do with HAVING
qualifiers?
* The find_aggref code added to planagg.c (where it doesn't belong anyway)
doesn't seem to be used anywhere.
* In the same vein, I'm unimpressed with moving GetAggInitVal into
execGrouping.c, which it isn't at all related to. I'd just leave it alone
and duplicate the code in nodeWindow.c --- it's not exactly large. If you
did insist on sharing this code it would be appropriate to change the
name and comments to reflect the fact that it's being used for more than
just aggregates, anyhow.
* And in the same vein. var.c is hardly the place to put a
search-for-wfuncs routine.
* It seems like a coin was flipped to determine whether struct and field
names would use "window", "win", or just "w" (I find "WFunc" to be
particularly unhelpful to a reader who doesn't already know what it is).
Please try to reduce the surprise factor. I'd suggest consistently using
"window" in type names, though "win" is an OK prefix for field names
within window-related structs.
* This is a bad idea:
/*
+ * OrderClause -
+ * representation of ORDER BY in Window
+ */
+ typedef SortGroupClause OrderClause;
+
+
+ /*
+ * PartitionClause -
+ * representaition of PATITION BY in Window
+ */
+ typedef SortGroupClause PartitionClause;
If they're just SortGroupClauses, call them that, don't invent an alias.
(Yes, I know SortClause and GroupClause used to be aliases. That was a
bad idea: it confused matters and required lots of useless duplicated
code, except for the places where we didn't duplicate code because we were
willing to assume struct equivalence. There's basically just nothing that
wins about that approach.) In any case, "order" and "partition" are
really bad names to be using here given the number of possible other
meanings for those terms in a DBMS context. If you actually need separate
struct types then names like WindowPartitionClause would be appropriate.
* The API changes chosen for func_get_detail seem pretty bizarre.
Why didn't you just add a new return code FUNCDETAIL_WINDOW?
* The node support needs to be gone over more closely. I noticed just in
random checking that WFunc is missing parse-location support in
nodeFuncs.c and the Query.hasWindow field got added to copyfuncs, outfuncs,
readfuncs, but not equalfuncs.
* Please heed the comment at the top of parallel_schedule about the max
number of tests per parallel group.
* I don't find the test added to opr_sanity.sql to be particularly sane.
We *will* have the ability to add window functions. But it might be
helpful to check that proisagg and proiswfunc aren't both set.
* errcodes.h is not the only place that has to be touched to add a new
error code --- see also sgml/errcodes.sgml, plpgsql/src/plerrcodes.h.
And what is your precedent for using 42813? I don't see that in the
standard. If it's coming from DB2 it's okay, otherwise we should be
using a private 'P' code.
* Please try to eliminate random whitespace changes from the patch
... *particularly* in files that otherwise wouldn't be touched at all
(there are at least two cases of that in this patch)
That's all I have time for right now ... more to come no doubt.
regards, tom lane
On 26/11/2008, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
2008/11/26 David Rowley <dgrowley@gmail.com>:
Hitoshi Harada wrote:
2008/11/20 David Rowley <dgrowley@gmail.com>:
-- The following query gives incorrect results on the
-- maxhighbid columnSELECT auctionid,
category,
description,
highestbid,
reserve,
MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid,
MAX(reserve) OVER() AS highest_reserve
FROM auction_items;If you remove the highest_reserve column you get the correct results.
The bug also affects MIN, AVG, COUNT, STDDEV but not SUM.
Good report! I fixed this bug, which was by a trival missuse of
list_concat() in the planner. This was occurred when the first
aggregate trans func is strict and the second aggregate argument may
be null. Yep, the argument of the second was implicitly passed to the
first wrongly. That's why it didn't occur if you delete the second
MAX().I added a test case with the existing data emulating yours (named
"strict aggs") but if it is wrong, let me know.It's not quite right yet. I'm also getting regression tests failing on
window. Let me know if you want the diffs.I've created a query that uses the table in your regression test.
max_salary1 gives incorrect results. If you remove the max_salary2 column it
gives the correct results.Please excuse the lack of sanity with the query. I had to do it this way to
get 2 columns with NULLs.SELECT depname,
empno,
salary,
salary1,
salary2,
MAX(salary1) OVER (ORDER BY empno) AS max_salary1,
MAX(salary2) OVER() AS max_salary2
FROM (SELECT depname,
empno,
salary,
(CASE WHEN salary < 5000 THEN NULL ELSE salary END) AS salary1,
(CASE WHEN salary >= 5000 THEN NULL ELSE salary END) AS salary2
FROM empsalary
) empsalary;Actual results:
depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2
-----------+-------+--------+---------+---------+-------------+-------------
sales | 1 | 5000 | 5000 | | | 4800
personnel | 2 | 3900 | | 3900 | | 4800
sales | 3 | 4800 | | 4800 | | 4800
sales | 4 | 4800 | | 4800 | | 4800
personnel | 5 | 3500 | | 3500 | | 4800
develop | 7 | 4200 | | 4200 | | 4800
develop | 8 | 6000 | 6000 | | | 4800
develop | 9 | 4500 | | 4500 | | 4800
develop | 10 | 5200 | 5200 | | | 4800
develop | 11 | 5200 | 5200 | | | 4800Correct results:
depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2
-----------+-------+--------+---------+---------+-------------+-------------
sales | 1 | 5000 | 5000 | | 5000 | 4800
personnel | 2 | 3900 | | 3900 | 5000 | 4800
sales | 3 | 4800 | | 4800 | 5000 | 4800
sales | 4 | 4800 | | 4800 | 5000 | 4800
personnel | 5 | 3500 | | 3500 | 5000 | 4800
develop | 7 | 4200 | | 4200 | 5000 | 4800
develop | 8 | 6000 | 6000 | | 6000 | 4800
develop | 9 | 4500 | | 4500 | 6000 | 4800
develop | 10 | 5200 | 5200 | | 6000 | 4800
develop | 11 | 5200 | 5200 | | 6000 | 4800This might be a good regression test once it's fixed.
Hmm, did you apply the latest patch correctly? My build can produce
right results, so I don't see why it isn't fixed. Make sure the lines
around 2420-2430 in planner.c like:
/*
* must copyObject() to avoid args concatenating with each other.
*/
pulled_exprs = list_concat(pulled_exprs, copyObject(wfunc->args));where copyObject() is added.
I'm sitting here away from home with a funny feeling I forgot to make install.
I think my home adsl has dropped out so can't confirm that. If it
works for you and not for me last night then I probably did forget.
I'll let you know.
I'm not sure if this is related, another bug is found:
*** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *************** *** 2246,2251 **** equal(void *a, void *b) --- 2246,2252 ---- break; case T_Aggref: retval = _equalAggref(a, b); + break; case T_WFunc: retval = _equalWFunc(a, b); break;don't laugh at... could you try it and test again?
I'll add the break; there.
If whole the new patch is needed, tell me then will send it.
I'm at a bit of a loss to what to do now. Should I wait for your work
Heikki? Or continue validating this patch?The best thing I can think to do right now is continue and any problems I
find you can add regression tests for, then if we keep your regression tests
for Heikki's changes then we can validate those changes more quickly.Any thoughts? Better ideas?
Thanks to your great tests, we now know much more about specification
and where to fail easily, so continuing makes sense but it may be good
time to take a rest and wait for Heikki's patch completing.
Ok, I'll continue future tests with Heikki's patches.
David
Show quoted text
Regards,
--
Hitoshi Harada
2008/11/27 Tom Lane <tgl@sss.pgh.pa.us>:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Here's another updated patch, including all your bug fixes.
I did a very fast pass through this and have a few comments.
Thanks for your comments. The executor part is now being refactored by
Heikki, but remaining are still on me. Note that some of those are
because of my earlier poor understanding.
* Don't bother manually updating keywords.sgml. As stated therein, that
table is automatically generated. All you'll accomplish is to cause merge
problems. (The effort would be a lot better spent on the non-boilerplate
parts of the docs anyway.)
OK, I intend nothing here but didn't know the rule. will remove it.
* I assume there's going to be some way to create user-defined window
functions?
Yes, but for 8.4 no. The window functions will need specific window
function APIs rather than existing PG_XXX APIs and the design of them
affects how to design the Window executor node. So we are currently
desgining the APIs. If it completes in the future users can create
their own functions.
* It seems fairly unlikely that you can get away with not supporting
any qual expression on a Window plan node. What will you do with HAVING
qualifiers?
Window nodes are executed after any of WHERE, GROUP BY, HAVING, and
before ORDER BY. Window nodes don't have qual and HAVING doesn't give
any effect to Window operations.
* The find_aggref code added to planagg.c (where it doesn't belong anyway)
doesn't seem to be used anywhere.
It was needed to extract Aggref node in planner once, but not needed
anymore. will remove it.
* In the same vein, I'm unimpressed with moving GetAggInitVal into
execGrouping.c, which it isn't at all related to. I'd just leave it alone
and duplicate the code in nodeWindow.c --- it's not exactly large. If you
did insist on sharing this code it would be appropriate to change the
name and comments to reflect the fact that it's being used for more than
just aggregates, anyhow.
It is now in the discussion. Since nodeWindow has much duplicated code
in initialize/advance/finalize so we wonder if those codes should be
shared among the two nodes. If so, GetAggInitVal seems to be shared as
well as other aggregate specific code. If we decide to separate them,
your suggestion that GetAggInitVal should be duplicated will be sane.
* And in the same vein. var.c is hardly the place to put a
search-for-wfuncs routine.
Agreed, but where to go? clause.c may be, or under parser/ ?
* It seems like a coin was flipped to determine whether struct and field
names would use "window", "win", or just "w" (I find "WFunc" to be
particularly unhelpful to a reader who doesn't already know what it is).
Please try to reduce the surprise factor. I'd suggest consistently using
"window" in type names, though "win" is an OK prefix for field names
within window-related structs.
I named WFunc as WinFunc once, but sounds too long for such heavily
used node. I liked it like Agg, but Win is not appropriate neither is
Func. And also, its name is consistent with the added pg_proc column
named proiswfunc. I wonder it would be proiswinfunc if we rename WFunc
as WinFunc.
* This is a bad idea:
/* + * OrderClause - + * representation of ORDER BY in Window + */ + typedef SortGroupClause OrderClause; + + + /* + * PartitionClause - + * representaition of PATITION BY in Window + */ + typedef SortGroupClause PartitionClause;If they're just SortGroupClauses, call them that, don't invent an alias.
(Yes, I know SortClause and GroupClause used to be aliases. That was a
bad idea: it confused matters and required lots of useless duplicated
code, except for the places where we didn't duplicate code because we were
willing to assume struct equivalence. There's basically just nothing that
wins about that approach.) In any case, "order" and "partition" are
really bad names to be using here given the number of possible other
meanings for those terms in a DBMS context. If you actually need separate
struct types then names like WindowPartitionClause would be appropriate.
This is because I didn't know quite well about windowed table
specification earlier (and when I was started the Group and the Sort
was separated as you point). And now I can tell the two nodes can be
named SortGroupClause, nothing special.
* The API changes chosen for func_get_detail seem pretty bizarre.
Why didn't you just add a new return code FUNCDETAIL_WINDOW?
An aggregate that is existing currently can be used as a window
function. But we need to treat it as specialized case. A normal
aggregate without OVER clause is GROUP BY aggregate and with OVER
clause it's window aggregate. For func_get_detail to determine which
aggregate windef variable must be passed. Is it better?
And also, block starting with "Oops. Time to die" comment in
ParseFuncOrColumn can be shared among two types. So I thought the two
are similar and func_get_detail cannot determine them in it. Sometimes
to be the same, sometimes not.
* The node support needs to be gone over more closely. I noticed just in
random checking that WFunc is missing parse-location support in
nodeFuncs.c and the Query.hasWindow field got added to copyfuncs, outfuncs,
readfuncs, but not equalfuncs.
Agree. will check it.
* Please heed the comment at the top of parallel_schedule about the max
number of tests per parallel group.
Oops. I need to be more heedful.
* I don't find the test added to opr_sanity.sql to be particularly sane.
We *will* have the ability to add window functions. But it might be
helpful to check that proisagg and proiswfunc aren't both set.
Hmmm, but we don't forbid the both set, so it seems slightly
confusing, especially when someone looks later. Ah, is it because
builtin functions don't have multiple src?
* errcodes.h is not the only place that has to be touched to add a new
error code --- see also sgml/errcodes.sgml, plpgsql/src/plerrcodes.h.
And what is your precedent for using 42813? I don't see that in the
standard. If it's coming from DB2 it's okay, otherwise we should be
using a private 'P' code.
I'm not sure but just incremented from GROUPING_ERROR. I googled a bit
DB2 codes but 42813 doesn't make sense and not found appropriate code.
Maybe 'P' will do.
* Please try to eliminate random whitespace changes from the patch
... *particularly* in files that otherwise wouldn't be touched at all
(there are at least two cases of that in this patch)
OK.
That's all I have time for right now ... more to come no doubt.
Particularly some of the planner part probably makes wrong. Thank you
for your check.
Regards,
--
Hitoshi Harada
I wrote:
Hmm, did you apply the latest patch correctly? My build can produce
right results, so I don't see why it isn't fixed. Make sure the lines
around 2420-2430 in planner.c like:
/*
* must copyObject() to avoid args concatenatingwith each other.
*/
pulled_exprs = list_concat(pulled_exprs,copyObject(wfunc->args));
where copyObject() is added.
I'm sitting here away from home with a funny feeling I forgot to make
install.
I think my home adsl has dropped out so can't confirm that. If it
works for you and not for me last night then I probably did forget.I'll let you know.
Sorry, false alarm. What can I say, it was late. I did make install, just to
the wrong place, so ended up using the old binary.
Sorry for panic. The bug is fixed. Only I still get regression failures on
window. Do they all pass for you?
I have most of the weekend to test Heikki's work, in fact it's building now.
But I'll have to leave it running...
David.
-----Original Message-----
From: Heikki Linnakangas [mailto:heikki.linnakangas@enterprisedb.com]
Sent: 26 November 2008 09:09
To: Hitoshi Harada
Cc: David Rowley; pgsql-hackers@postgresql.org
Subject: Re: Windowing Function Patch Review -> Standard ConformanceHitoshi Harada wrote:
2008/11/26 David Rowley <dgrowley@gmail.com>:
I'm at a bit of a loss to what to do now. Should I wait for your work
Heikki? Or continue validating this patch?The best thing I can think to do right now is continue and any problems
I
find you can add regression tests for, then if we keep your regression
tests
for Heikki's changes then we can validate those changes more quickly.
Any thoughts? Better ideas?
Thanks to your great tests, we now know much more about specification
and where to fail easily, so continuing makes sense but it may be good
time to take a rest and wait for Heikki's patch completing.Here's another updated patch, including all your bug fixes.
There's two known issues:
- ranking functions still don't treat peer rows correctly.- I commented out the "this function requires ORDER BY clause in the
window" test in rank_up, because a window function shouldn't be poking
into the WindowState struct like that. I wonder if it's really needed?
In section 7.11, the SQL2008 spec says "if WD has no window ordering
clause, then the window ordering is implementation-dependent, and *all
rows are peers*". The regression test now fails because of this, but the
current behavior actually seems correct to me.
Thanks for the updated patch. I've been doing a little testing over the
weekend.
I'm running into a small problem with passing results from normal aggregate
functions into the window function. Hitoshi and I saw this before:
SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP
BY depname;
ERROR: variable not found in subplan target list
Hitoshi fixed this in his 2008-11-12 patch, but it seems to have found its
way back in.
I was also reading over the standard tonight. I've discovered that the
OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is not
present. Oracle seems to support this.
SQL2008 says:
If <lead or lag function> is specified, then:
i) Let VE1 be <lead or lag extent> and let DT be the declared type of VE1.
ii) Case:
Scalar expressions 209
WD 9075-2:200w(E)
6.10 <window function>
If <offset> is specified, then let OFF be <offset>. The declared type of
OFF shall be an
exact numeric type with scale 0 (zero).
1)
2) Otherwise, let OFF be 1 (one).
Yet another variant of LEAD() and LAG() but I think well worth it for both
compliance to the standard and compatibility to Oracle.
I've also been thinking about the on-demand buffering for the window
functions that you talked about. In some of my previous performance tests I
noticed that LEAD with a LIMIT clause is not optimal as it will store all
the rows in the tuplestore before applying the LIMIT:
select timestamp,lead(timestamp,1) over (order by id) from bigtable order by
id limit 1;
Is there any way for the on-demand buffering to make the above query faster?
Really only the first 2 rows ordered by id need to be read.
I had previously thought this would be too difficult to implement as the
OFFSET can be variable, but maybe it's possible on-demand. Yet I'd imagine
it's difficult to know when to allow rows to exit the tuplestore as a
LAG(col, someVariableValue) might need those rows later.
My next task is to read more of the standard just in case there is anything
else we should know.
David.
Show quoted text
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com