Windowing Function Patch Review -> Standard Conformance

Started by David Rowleyover 17 years ago31 messageshackers
Jump to latest
#1David Rowley
dgrowleyml@gmail.com

I wrote:

All,

This is my first patch review for PostgreSQL. I did submit a patch last
commit fest (Boyer-Moore) so I feel I should review one this commit fest.
I'm quite new to PostgreSQL so please don't rely on me totally. I'll do my
best. Heikki is also reviewing this patch which makes me feel better.

My aim is to get the author has much feed back as quickly as possible. For
this reason I'm going to be breaking down my reviews into the following
topics.

1. Does patch apply cleanly?

2. Any compiler warnings?

3. Do the results follow the SQL standard?

4. Performance Comparison, does it perform better than alternate ways of
doing things. Self joins, sub queries etc.

5. Performance, no comparison. How does it perform with larger tables?

This thread covers part of 3.

Quoted from SQL:2008
"If CUME_DIST is specified, then the relative rank of a row R is defined as
NP/NR, where NP is defined
to be the number of rows preceding or peer with R in the window ordering of
the window partition of R
and NR is defined to be the number of rows in the window partition of R."

So let me create a quick test case...

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);

insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

My interpretation of the standard should make the last two columns in the
following query equal, and they are in the patch.

SELECT name,CAST(r AS FLOAT) / c, cd
FROM (SELECT name,
ROW_NUMBER() OVER(ORDER BY salary) as r,
COUNT(*) OVER() AS c,
CUME_DIST() OVER(ORDER BY salary) AS cd
FROM employees
) t;

Both Oracle and Sybase say otherwise. Have I (we both) misinterpreted the
standard?

name,cast(t.r as real)/t.c,cd
'Jeff',0.16666666666666666,0.16666666666666666
'Sam',0.3333333333333333,0.3333333333333333
'Ian',0.5,0.5
'Richard',0.6666666666666666,0.6666666666666666
'John',0.8333333333333334,1.0
'Matthew',1.0,1.0

Above are the results from Sybase.

Can anyone see who is correct here? Is it possible that both Oracle and
Sybase have it wrong?

David.

#2Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: David Rowley (#1)
Re: Windowing Function Patch Review -> Standard Conformance

Quoted from SQL:2008
"If CUME_DIST is specified, then the relative *rank *of a row R is defined
as
NP/NR, where NP is defined
to be the number of rows preceding or peer with R in the window ordering of
the window partition of R
and NR is defined to be the number of rows in the window partition of R."

I guess there is a difference between "row_number" and "number of rows

preceding or peer with R"

"number of rows preceding or peer with R" == count(*) over (order by salary)

As far as I understand, the following query should calculate cume_dist
properly (and it does so in Oracle):

SELECT name,CAST(r AS FLOAT) / c, cd
FROM (SELECT name,
COUNT(*) OVER(ORDER BY salary) as r,
COUNT(*) OVER() AS c,
CUME_DIST() OVER(ORDER BY salary) AS cd
FROM employees
) t;

Sincerely yours,
Vladimir Sitnikov

#3Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Vladimir Sitnikov (#2)
Re: Windowing Function Patch Review -> Standard Conformance

2008/11/5 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:

Quoted from SQL:2008
"If CUME_DIST is specified, then the relative rank of a row R is defined
as
NP/NR, where NP is defined
to be the number of rows preceding or peer with R in the window ordering
of
the window partition of R
and NR is defined to be the number of rows in the window partition of R."

I guess there is a difference between "row_number" and "number of rows
preceding or peer with R"

"number of rows preceding or peer with R" == count(*) over (order by salary)

As far as I understand, the following query should calculate cume_dist
properly (and it does so in Oracle):

SELECT name,CAST(r AS FLOAT) / c, cd
FROM (SELECT name,
COUNT(*) OVER(ORDER BY salary) as r,
COUNT(*) OVER() AS c,
CUME_DIST() OVER(ORDER BY salary) AS cd
FROM employees
) t;

I'm afraid I misinterpreted it. As you say,

"number of rows preceding == row_number()"

and

"rumber of rows preceding or peers to R != row_number() (neither rank())"

"peers to R" in the window function context means "same rows by the
ORDER BY clause", so in the first example, id=5 and id=6 are peers and
in both rows, NP should be 6, as Oracle and Sybase say.

Even though I understand the definition, your suggestion of COUNT(*)
OVER (ORDER BY salary) doesn't make sense. In the patch, it simply
returns the same value as row_number() but is it wrong, too?

Regards,

--
Hitoshi Harada

#4Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: Hitoshi Harada (#3)
Re: Windowing Function Patch Review -> Standard Conformance

Even though I understand the definition, your suggestion of COUNT(*)
OVER (ORDER BY salary) doesn't make sense.

Why does not that make sense?
I have not read the spec, however Oracle has a default window specification
in case there is only an order by clause. The default window is "range
between unbounded preceding and current row".

"count(*) over (order by salary range between unbounded preceding and
current row)" is perfectly identical to the "number of rows preceding or
peers to R" by the definition, isn't it? I see here a word-by-word
translation from SQL to the English and vice versa.

If the patch returns "row_number" it is wrong since there is no way for
row_number to be a "number of rows preceding or peer with R", is there?

Regards,
Vladimir Sitnikov

#5Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Vladimir Sitnikov (#4)
Re: Windowing Function Patch Review -> Standard Conformance

2008/11/5 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:

Even though I understand the definition, your suggestion of COUNT(*)
OVER (ORDER BY salary) doesn't make sense.

Why does not that make sense?
I have not read the spec, however Oracle has a default window specification
in case there is only an order by clause. The default window is "range
between unbounded preceding and current row".

"count(*) over (order by salary range between unbounded preceding and
current row)" is perfectly identical to the "number of rows preceding or
peers to R" by the definition, isn't it? I see here a word-by-word
translation from SQL to the English and vice versa.

If the patch returns "row_number" it is wrong since there is no way for
row_number to be a "number of rows preceding or peer with R", is there?

I've got it.
I had thought that implicit window framing specified by ORDER BY
clause (such like above) would mean ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW. But actually reading the spec more closely it says:

Otherwise, WF consists of all rows of the partition of R that precede
R or are peers of R in the
window ordering of the window partition defined by the window ordering clause.

So it means RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as you
pointed. And the result of count(*) OVER (ORDER BY salary) doesn't
equal to row_number().

Now my assumption is broken. Let me take time to think about how to solve it...

Regards,

--
Hitoshi Harada

#6Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#5)
Re: Windowing Function Patch Review -> Standard Conformance

David, Vladimir,

2008/11/5 Hitoshi Harada <umi.tanuki@gmail.com>:

Now my assumption is broken. Let me take time to think about how to solve it...

I fixed the points we discussed a few days ago. The delta is here:

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bfbc039f68aa749b403c84a803d49a02b3d6c199;hp=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8

although attached is the whole (split) patch.

In addition to fixing cume_dist() and implicit frame definition, I
added two window function APIs, WinRowIsPeer() and WinIterFinish(). Up
to now I've avoided to touch ORDER BY clause comparisons deeply,
because I didn't see any abstraction in that except rank(). But now I
know the very important word "peers" appears so many times in the
spec, I'm inclined to implement some general mechanisms for those APIs
like IsPeer(). Also, as with this version part of RANGE is supported,
the road to the FRAME clause support got shorter than before.

Thanks for your feedback and continuing tests!

Regards,

--
Hitoshi Harada

Attachments:

window_functions.patch.20081107-1.gzapplication/x-gzip; name=window_functions.patch.20081107-1.gzDownload
#7Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#6)
Re: Windowing Function Patch Review -> Standard Conformance

patch-2

2008/11/8 Hitoshi Harada <umi.tanuki@gmail.com>:

David, Vladimir,

2008/11/5 Hitoshi Harada <umi.tanuki@gmail.com>:

Now my assumption is broken. Let me take time to think about how to solve it...

I fixed the points we discussed a few days ago. The delta is here:

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bfbc039f68aa749b403c84a803d49a02b3d6c199;hp=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8

although attached is the whole (split) patch.

In addition to fixing cume_dist() and implicit frame definition, I
added two window function APIs, WinRowIsPeer() and WinIterFinish(). Up
to now I've avoided to touch ORDER BY clause comparisons deeply,
because I didn't see any abstraction in that except rank(). But now I
know the very important word "peers" appears so many times in the
spec, I'm inclined to implement some general mechanisms for those APIs
like IsPeer(). Also, as with this version part of RANGE is supported,
the road to the FRAME clause support got shorter than before.

Thanks for your feedback and continuing tests!

Regards,

--
Hitoshi Harada

--
Hitoshi Harada

Attachments:

window_functions.patch.20081107-2.gzapplication/x-gzip; name=window_functions.patch.20081107-2.gzDownload
#8David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#7)
Re: Windowing Function Patch Review -> Standard Conformance

Hitoshi Harada Wrote:

although attached is the whole (split) patch.

I'm having some trouble getting these patches to patch cleanly. I think it's
because of this that I can't get postgres to compile after applying the
patch.

It errors out at tuptoaster.c some constants seem to be missing from
fmgroids.h

If I open src/include/utils/fmgroids.h

The only constaint being defined is:

#define F__NULL_ 31

I'd assume it was a problem with my setup, but the CVS head compiles fine.

Let me know if I'm doing something wrong.

Here is the output from patch and the compile errors.

david@ubuntu1:~/pg8.4/pgsql$ patch -p1 < window_functions.patch.20081107-1
patching file doc/src/sgml/func.sgml
Hunk #1 succeeded at 10061 (offset 11 lines).
patching file doc/src/sgml/keywords.sgml
patching file doc/src/sgml/queries.sgml
patching file doc/src/sgml/query.sgml
patching file doc/src/sgml/ref/select.sgml
patching file doc/src/sgml/syntax.sgml
patching file src/backend/catalog/pg_aggregate.c
patching file src/backend/catalog/pg_proc.c
patching file src/backend/commands/explain.c
patching file src/backend/executor/Makefile
patching file src/backend/executor/execAmi.c
patching file src/backend/executor/execGrouping.c
patching file src/backend/executor/execProcnode.c
patching file src/backend/executor/execQual.c
Hunk #3 succeeded at 4167 (offset 14 lines).
patching file src/backend/executor/nodeAgg.c
patching file src/backend/executor/nodeWindow.c
patching file src/backend/nodes/copyfuncs.c
patching file src/backend/nodes/equalfuncs.c
patching file src/backend/nodes/nodeFuncs.c
patching file src/backend/nodes/outfuncs.c
patching file src/backend/nodes/readfuncs.c
patching file src/backend/optimizer/path/allpaths.c
patching file src/backend/optimizer/path/equivclass.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/planagg.c
patching file src/backend/optimizer/plan/planmain.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/optimizer/plan/setrefs.c
patching file src/backend/optimizer/plan/subselect.c
patching file src/backend/optimizer/prep/prepjointree.c
patching file src/backend/optimizer/util/clauses.c
patching file src/backend/optimizer/util/tlist.c
patching file src/backend/optimizer/util/var.c
patching file src/backend/parser/analyze.c
patching file src/backend/parser/gram.y
Hunk #6 succeeded at 6433 (offset 8 lines).
Hunk #7 succeeded at 6443 (offset 8 lines).
Hunk #8 succeeded at 8212 (offset 9 lines).
Hunk #9 succeeded at 8220 (offset 9 lines).
Hunk #10 succeeded at 8232 (offset 9 lines).
Hunk #11 succeeded at 8244 (offset 9 lines).
Hunk #12 succeeded at 8256 (offset 9 lines).
Hunk #13 succeeded at 8272 (offset 9 lines).
Hunk #14 succeeded at 8284 (offset 9 lines).
Hunk #15 succeeded at 8306 (offset 9 lines).
Hunk #16 succeeded at 8754 (offset 9 lines).
Hunk #17 succeeded at 9629 (offset 9 lines).
Hunk #18 succeeded at 9637 (offset 9 lines).
Hunk #19 succeeded at 9703 (offset 9 lines).
Hunk #20 succeeded at 9720 (offset 9 lines).
Hunk #21 succeeded at 9772 (offset 9 lines).
Hunk #22 succeeded at 9959 (offset 9 lines).
Hunk #23 succeeded at 9980 (offset 9 lines).
patching file src/backend/parser/keywords.c
patching file src/backend/parser/parse_agg.c
patching file src/backend/parser/parse_clause.c
patching file src/backend/parser/parse_expr.c
patching file src/backend/parser/parse_func.c
patching file src/backend/rewrite/rewriteManip.c
patching file src/backend/utils/adt/Makefile
patching file src/backend/utils/adt/ruleutils.c
patching file src/backend/utils/adt/wfunc.c
patching file src/backend/utils/sort/tuplestore.c
patching file src/include/catalog/pg_attribute.h
patching file src/include/catalog/pg_class.h
david@ubuntu1:~/pg8.4/pgsql$ patch -p1 < window_functions.patch.20081107-2
(Stripping trailing CRs from patch.)
patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej
(Stripping trailing CRs from patch.)
patching file src/include/executor/executor.h
(Stripping trailing CRs from patch.)
patching file src/include/executor/nodeWindow.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/execnodes.h
Hunk #3 succeeded at 509 (offset 2 lines).
Hunk #4 succeeded at 1586 (offset 2 lines).
(Stripping trailing CRs from patch.)
patching file src/include/nodes/nodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/parsenodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/plannodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/primnodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/relation.h
(Stripping trailing CRs from patch.)
patching file src/include/optimizer/planmain.h
(Stripping trailing CRs from patch.)
patching file src/include/optimizer/var.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_agg.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_clause.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_func.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_node.h
(Stripping trailing CRs from patch.)
patching file src/include/rewrite/rewriteManip.h
(Stripping trailing CRs from patch.)
patching file src/include/utils/builtins.h
Hunk #1 succeeded at 975 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file src/include/utils/errcodes.h
(Stripping trailing CRs from patch.)
patching file src/include/utils/tuplestore.h
(Stripping trailing CRs from patch.)
patching file src/interfaces/ecpg/preproc/preproc.y
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/opr_sanity.out
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/window.out
(Stripping trailing CRs from patch.)
patching file src/test/regress/parallel_schedule
(Stripping trailing CRs from patch.)
patching file src/test/regress/serial_schedule
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/create_table.sql
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/opr_sanity.sql
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/window.sql
david@ubuntu1:~/pg8.4/pgsql$

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv
-I../../../../src/include -D_GNU_SOURCE -c -o tuptoaster.o tuptoaster.c
tuptoaster.c: In function âtoast_delete_datumâ:
tuptoaster.c:1293: error: âF_OIDEQâ undeclared (first use in this function)
tuptoaster.c:1293: error: (Each undeclared identifier is reported only once
tuptoaster.c:1293: error: for each function it appears in.)
tuptoaster.c: In function âtoast_fetch_datumâ:
tuptoaster.c:1372: error: âF_OIDEQâ undeclared (first use in this function)
tuptoaster.c: In function âtoast_fetch_datum_sliceâ:
tuptoaster.c:1567: error: âF_OIDEQâ undeclared (first use in this function)
tuptoaster.c:1577: error: âF_INT4EQâ undeclared (first use in this function)
tuptoaster.c:1585: error: âF_INT4GEâ undeclared (first use in this function)
tuptoaster.c:1589: error: âF_INT4LEâ undeclared (first use in this function)
make[4]: *** [tuptoaster.o] Error 1
make[4]: Leaving directory `/home/david/pg8.4/pgsql/src/backend/access/heap'
make[3]: *** [heap-recursive] Error 2
make[3]: Leaving directory `/home/david/pg8.4/pgsql/src/backend/access'
make[2]: *** [access-recursive] Error 2
make[2]: Leaving directory `/home/david/pg8.4/pgsql/src/backend'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/david/pg8.4/pgsql/src'
make: *** [all] Error 2

#9Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Rowley (#8)
Re: Windowing Function Patch Review -> Standard Conformance

2008/11/9 David Rowley <dgrowley@gmail.com>:

Hitoshi Harada Wrote:

although attached is the whole (split) patch.

I'm having some trouble getting these patches to patch cleanly. I think it's
because of this that I can't get postgres to compile after applying the
patch.

It errors out at tuptoaster.c some constants seem to be missing from
fmgroids.h

If I open src/include/utils/fmgroids.h

The only constaint being defined is:

#define F__NULL_ 31

I'd assume it was a problem with my setup, but the CVS head compiles fine.

Let me know if I'm doing something wrong.

patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej

As it says pg_proc.h may have conflicts. The patch is not against HEAD
but based on the same as the previous (v08) patch. I am remote now so
I'm not sure but could you try to find conflicts in pg_proc.h? The
pg_proc catalog has been added 1 column called prociswfunc (bool) in
the patch. All you have to do is add 'f' in the middle of new-coming
lines.
Sorry for annoying, and I'll be back in hours.

Regards,

--
Hitoshi Harada

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#8)
Re: Windowing Function Patch Review -> Standard Conformance

"David Rowley" <dgrowley@gmail.com> writes:

patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej

I imagine you'll find that "hunk #4" covers the entire DATA() body of
the file :-(. It can't possibly apply cleanly if anyone's added or
altered pg_proc entries since the patch was made.

What you'd need to do is manually insert proiswfunc = 'f' entries in
all the existing DATA lines (this is usually not too hard with sed or
an emacs macro), then add whatever new functions the patch defines.
Even figuring out the latter from the patch representation can be a
serious PITA, since they'll be a few lines out of a multi-thousand-line
failed diff hunk.

I'm not sure if Hitoshi is in a position to submit the pg_proc changes
as two separate diffs --- one to add the new column and a separate one
to add in the new functions --- but it'd be a lot easier to deal with
merge issues if he could.

(Now I'll sit back and wait for some fanboy to claim that
$his_favorite_scm could solve this automatically ...)

regards, tom lane

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: Windowing Function Patch Review -> Standard Conformance

Instead of a patch it might be easier to submit the new columns as a
perl script or sed command. We do something like that to make merging
pg_proc easier.

greg

On 8 Nov 2008, at 01:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

"David Rowley" <dgrowley@gmail.com> writes:

patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej

I imagine you'll find that "hunk #4" covers the entire DATA() body of
the file :-(. It can't possibly apply cleanly if anyone's added or
altered pg_proc entries since the patch was made.

What you'd need to do is manually insert proiswfunc = 'f' entries in
all the existing DATA lines (this is usually not too hard with sed or
an emacs macro), then add whatever new functions the patch defines.
Even figuring out the latter from the patch representation can be a
serious PITA, since they'll be a few lines out of a multi-thousand-
line
failed diff hunk.

I'm not sure if Hitoshi is in a position to submit the pg_proc changes
as two separate diffs --- one to add the new column and a separate one
to add in the new functions --- but it'd be a lot easier to deal with
merge issues if he could.

(Now I'll sit back and wait for some fanboy to claim that
$his_favorite_scm could solve this automatically ...)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#9)
Re: Windowing Function Patch Review -> Standard Conformance

2008/11/9 Hitoshi Harada <umi.tanuki@gmail.com>:

2008/11/9 David Rowley <dgrowley@gmail.com>:

Hitoshi Harada Wrote:

although attached is the whole (split) patch.

I'm having some trouble getting these patches to patch cleanly. I think it's
because of this that I can't get postgres to compile after applying the
patch.

It errors out at tuptoaster.c some constants seem to be missing from
fmgroids.h

If I open src/include/utils/fmgroids.h

The only constaint being defined is:

#define F__NULL_ 31

I'd assume it was a problem with my setup, but the CVS head compiles fine.

Let me know if I'm doing something wrong.

patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej

As it says pg_proc.h may have conflicts. The patch is not against HEAD
but based on the same as the previous (v08) patch. I am remote now so
I'm not sure but could you try to find conflicts in pg_proc.h? The
pg_proc catalog has been added 1 column called prociswfunc (bool) in
the patch. All you have to do is add 'f' in the middle of new-coming
lines.
Sorry for annoying, and I'll be back in hours.

I recreate the patch against current HEAD, in the git it's here:

http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543d873bb7228f4c057c23e0

I tested `patch -p1` with the attached and succeeded to make it work
cleanly. It seems to me that this patch can be applied until another
pg_proc.h entry would introduced in the HEAD.

Regards,

--
Hitoshi Harada

Attachments:

window_functions.patch.20081109-1.gzapplication/x-gzip; name=window_functions.patch.20081109-1.gzDownload+0-2
#13Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#12)
Re: Windowing Function Patch Review -> Standard Conformance

patch-2

2008/11/9 Hitoshi Harada <umi.tanuki@gmail.com>:

2008/11/9 Hitoshi Harada <umi.tanuki@gmail.com>:

2008/11/9 David Rowley <dgrowley@gmail.com>:

Hitoshi Harada Wrote:

although attached is the whole (split) patch.

I'm having some trouble getting these patches to patch cleanly. I think it's
because of this that I can't get postgres to compile after applying the
patch.

It errors out at tuptoaster.c some constants seem to be missing from
fmgroids.h

If I open src/include/utils/fmgroids.h

The only constaint being defined is:

#define F__NULL_ 31

I'd assume it was a problem with my setup, but the CVS head compiles fine.

Let me know if I'm doing something wrong.

patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej

As it says pg_proc.h may have conflicts. The patch is not against HEAD
but based on the same as the previous (v08) patch. I am remote now so
I'm not sure but could you try to find conflicts in pg_proc.h? The
pg_proc catalog has been added 1 column called prociswfunc (bool) in
the patch. All you have to do is add 'f' in the middle of new-coming
lines.
Sorry for annoying, and I'll be back in hours.

I recreate the patch against current HEAD, in the git it's here:

http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543d873bb7228f4c057c23e0

I tested `patch -p1` with the attached and succeeded to make it work
cleanly. It seems to me that this patch can be applied until another
pg_proc.h entry would introduced in the HEAD.

Regards,

--
Hitoshi Harada

--
Hitoshi Harada

Attachments:

window_functions.patch.20081109-2.gzapplication/x-gzip; name=window_functions.patch.20081109-2.gzDownload
#14David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#12)
Re: Windowing Function Patch Review -> Standard Conformance

Hitoshi Harada wrote:

I recreate the patch against current HEAD, in the git it's here:

http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543
d873bb7228f4c057c23e0

I tested `patch -p1` with the attached and succeeded to make it work
cleanly. It seems to me that this patch can be applied until another
pg_proc.h entry would introduced in the HEAD.

Regards,

--
Hitoshi Harada

Thanks Hitoshi,

You've done what I planned to do first thing this morning. I was planning to
write a script that checks for the new lines then I could have removed
those, patched then re-added them with the extra f.

It looks like it's not easy to keep it working with CVS head. I counted 3
patches that touched pg_proc.h just this month. I must have missed another
as I still couldn't get it to apply. Hence the need for the script.

Thanks for the new patch I'll get testing now.

David.

#15Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Rowley (#14)
Re: Windowing Function Patch Review -> Standard Conformance

2008/11/9 David Rowley <dgrowley@gmail.com>:

Hitoshi Harada wrote:

I recreate the patch against current HEAD, in the git it's here:

http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543
d873bb7228f4c057c23e0

I tested `patch -p1` with the attached and succeeded to make it work
cleanly. It seems to me that this patch can be applied until another
pg_proc.h entry would introduced in the HEAD.

Regards,

--
Hitoshi Harada

Thanks Hitoshi,

You've done what I planned to do first thing this morning. I was planning to
write a script that checks for the new lines then I could have removed
those, patched then re-added them with the extra f.

It looks like it's not easy to keep it working with CVS head. I counted 3
patches that touched pg_proc.h just this month. I must have missed another
as I still couldn't get it to apply. Hence the need for the script.

Thanks for the new patch I'll get testing now.

David.

I'm glad to hear that. Actually thanks to git it is quite easy for me
to merge my own repository with the HEAD. It tells me which lines are
new coming and which lines I modified are newer than else in CVS. This
is my first project where I use git (and I am not guru of cvs either
-:) but I love it now.
So feel free to tell me to recreate a new patch against head. Not so
heavy work as making on-the-fly script.

Regards,

--
Hitoshi Harada

#16David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#15)
Re: Windowing Function Patch Review -> Standard Conformance

Hitoshi Harada wrote:

I'm glad to hear that. Actually thanks to git it is quite easy for me
to merge my own repository with the HEAD. It tells me which lines are
new coming and which lines I modified are newer than else in CVS. This
is my first project where I use git (and I am not guru of cvs either
-:) but I love it now.
So feel free to tell me to recreate a new patch against head. Not so
heavy work as making on-the-fly script.

CUME_DIST and COUNT(*) seem to be working as expected now with the original
tests. I'll continue to test more and also test the other windowing
functions that I've not got to yet.

I did receive a small warning from the compiler when compiling nodeWindow.c.
It was complaining about a possible un-initialised f_shrinking. Just to keep
it quiet I changed the break's to return's after the elog() call.

*** nodeWindow.c        2008-11-09 10:54:50.000000000 +0000
--- nodeWindow.c        2008-11-09 10:39:24.000000000 +0000
***************
*** 818,824 ****
                        case FRAME_CURRENT_RANGE:
                                /* UNSUPPORTED */
                                elog(ERROR, "unknown preceding type %d",
node->preceding_type);
!                               break;
                        case FRAME_VALUE_ROWS:
                                if (node->preceding_rows > 0)
                                        f_shrinking = 0;
--- 818,824 ----
                        case FRAME_CURRENT_RANGE:
                                /* UNSUPPORTED */
                                elog(ERROR, "unknown preceding type %d",
node->preceding_type);
!                               return; /* keep compiler quiet */
                        case FRAME_VALUE_ROWS:
                                if (node->preceding_rows > 0)
                                        f_shrinking = 0;
***************
*** 922,928 ****
                        case FRAME_CURRENT_RANGE:
                                /* UNSUPPORTED */
                                elog(ERROR, "unknown preceding type %d",
node->preceding_type);
!                               break;
                        case FRAME_VALUE_ROWS:
                                if (node->preceding_rows <=
winobj->p_currentpos + 1)
                                        f_shrinking = 1;
--- 922,928 ----
                        case FRAME_CURRENT_RANGE:
                                /* UNSUPPORTED */
                                elog(ERROR, "unknown preceding type %d",
node->preceding_type);
!                               return; /* keep compiler quiet */
                        case FRAME_VALUE_ROWS:
                                if (node->preceding_rows <=
winobj->p_currentpos + 1)
                                        f_shrinking = 1;
#17David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#15)
Re: Windowing Function Patch Review -> Standard Conformance

Using one of my original test tables I'm testing windowing functions with a
GROUP BY.

The following query works as I would expect.

-- Works
SELECT department,
SUM(Salary),
ROW_NUMBER() OVER (ORDER BY department),
SUM(SUM(salary)) OVER (ORDER BY department)
FROM employees
GROUP BY department;

The following one fails with the message.
ERROR: variable not found in subplan target list

-- Does not work.
SELECT department,
SUM(Salary),
ROW_NUMBER() OVER (ORDER BY department),
SUM(SUM(salary)) OVER (ORDER BY department DESC)
FROM employees
GROUP BY department;

I just added the DESC to force it into creating 2 separate windows.

I can re-write the non working query to work using the following:

SELECT department,
salary,
ROW_NUMBER() OVER (ORDER BY department),
SUM(salary) OVER (ORDER BY department DESC)
FROM (SELECT department,
SUM(salary) AS salary
FROM employees
GROUP BY department
) t;

Testing with:

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);

insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

#18David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#15)
Windowing Function Patch Review -> NTILE function

I've done a little testing with NTILE(). I think a check should be added to
the ntile() function in wfunc.c.

david=# select name,salary,ntile(0) over (order by salary) as n from
employees;
ERROR: floating-point exception
DETAIL: An invalid floating-point operation was signaled. This probably
means an out-of-range result or an invalid operation, such as division by
zero.

I tracked that message back to the signal handler in postgres.c :-( simple
fix though. Any value less than 1 does not really make sense to me.

Maybe we should add something like:

if (PG_WINDOW_ARG(0) < 1)
elog(ERROR, "negative or zero ntile argument not allowed");

What do you think?

Oracle errors out on less than 1, Sybase seems not to have ntile.
MSSQL 2008 also errors out on less than 1

David.

#19David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#15)
Windowing Function Patch Review -> NTH_VALUE

I'm having a little trouble understanding the standard for NTH_VALUE(). I
would have assumed that NTH_VALUE(name,1) would return the first name in the
window. The current patch is using 0 for the first.

Here is the paragraph I'm reading in the standard:

"The nth-value function takes an arbitrary <value expression> VE and a
<simple value specification> or a <dynamic parameter specification> that
evaluates to an exact numeric value n with scale 0 (zero) as arguments
and, for each row R of a windowed table, returns the value of VE evaluated
on the n-th row from the first (if FROM FIRST is specified or implied) or
the last (if FROM LAST is specified) row of the window frame of R defined by
a window structure descriptor. In addition, RESPECT NULLS or IGNORE NULLS
can be specified to indicate whether the rows for which VE evaluates to the
null value are preserved or eliminated."

The text "returns the value of VE evaluated on the n-th row from the first".
I find the "from the first" quite difficult to understand. If it said "in
the window partition" then that seems simple. I'm not sure if "from the
first" includes or does not include the first row in the window partition.

Perhaps it's easier to see in an example.

(Using employees table from another thread) for those who missed it:

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);

insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

david=# select *,nth_value(name,1) over (order by id) from employees;
id | name | department | salary | nth_value
----+---------+------------+--------+-----------
1 | Jeff | IT | 10000 |
2 | Sam | IT | 12000 | Sam
3 | Richard | Manager | 30000 | Sam
4 | Ian | Manager | 20000 | Sam
5 | John | IT | 60000 | Sam
6 | Matthew | Director | 60000 | Sam
(6 rows)

"Sam" is the name from the 2nd row in the window partition.

david=# select *,nth_value(name,0) over (order by id) from employees;
id | name | department | salary | nth_value
----+---------+------------+--------+-----------
1 | Jeff | IT | 10000 | Jeff
2 | Sam | IT | 12000 | Jeff
3 | Richard | Manager | 30000 | Jeff
4 | Ian | Manager | 20000 | Jeff
5 | John | IT | 60000 | Jeff
6 | Matthew | Director | 60000 | Jeff

Also does anyone think that a negative nth_value should be disallowed. The
standard does not seem to give any details on this.

david=# select *,nth_value(name,-1) over (order by id) from employees;
id | name | department | salary | nth_value
----+---------+------------+--------+-----------
1 | Jeff | IT | 10000 |
2 | Sam | IT | 12000 |
3 | Richard | Manager | 30000 |
4 | Ian | Manager | 20000 |
5 | John | IT | 60000 |
6 | Matthew | Director | 60000 |

I also cannot find another RDBMS that implements NTH_VALUE to see what they
do.

Does anyone know if one exists?

Anyone out there able to understand what the standard requires in this case?

It just seems strange to have NTH_VALUE(col,1) return the 2nd row when
functions like ROW_NUMBER() work with base 1 rather than base 0.

Any help or comments on this would be appreciated.

David.

#20David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#15)
Windowing Function Patch Review -> ROW_NUMBER without ORDER BY

I've been trying to think of a use case for using ROW_NUMBER() with no ORDER
BY in the window clause.

Using the example table I always seem to be using, for those who missed it
in other threads.

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);
insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);
insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

david=# select *,row_number() over () from employees;
id | name | department | salary | row_number
----+---------+------------+--------+------------
1 | Jeff | IT | 10000 | 1
2 | Sam | IT | 12000 | 2
4 | Ian | Manager | 20000 | 3
5 | John | IT | 60000 | 4
6 | Matthew | Director | 60000 | 5
3 | Richard | Manager | 30000 | 6
(6 rows)

row_number seems to assign the rows a number in order of how it reads them
from the heap. Just to confirm...

update employees set salary = salary where id = 3;

david=# select *,row_number() over () from employees;
id | name | department | salary | row_number
----+---------+------------+--------+------------
1 | Jeff | IT | 10000 | 1
2 | Sam | IT | 12000 | 2
4 | Ian | Manager | 20000 | 3
5 | John | IT | 60000 | 4
6 | Matthew | Director | 60000 | 5
3 | Richard | Manager | 30000 | 6
(6 rows)

The spec says: "The ROW_NUMBER function computes the sequential row number,
starting with 1 (one) for the first row, of the row within its window
partition according to the window ordering of the window."

I'm just not sure if we should block this or not.

Does anyone see this as a feature?

Does anyone see this as a bug?

Any feedback is welcome

David.

#21Andreas Joseph Krogh
andreak@officenet.no
In reply to: David Rowley (#20)
#22Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: Andreas Joseph Krogh (#21)
#23Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Rowley (#20)
#24Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Rowley (#17)
#25Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#24)
#26Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Rowley (#18)
#27Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Rowley (#19)
#28David Rowley
dgrowleyml@gmail.com
In reply to: Hitoshi Harada (#23)
#29Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#27)
#30Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#29)
#31Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#30)