GROUPING SETS revisited
In case anyone's interested, I've taken the CTE-based grouping sets patch from
[1]: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php
it for whitespace consistency, style conformity, or anything else, but (tuits
permitting) hope to figure out how it works and get it closer to commitability
in some upcoming commitfest.
I mention it here so that if someone else is working on it, we can avoid
duplicated effort, and to see if a CTE-based grouping sets implementation is
really the way we think we want to go.
[1]: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
Attachments:
gs91.patchtext/x-diff; charset=us-asciiDownload+468-106
Hello
2010/8/3 Joshua Tolley <eggyknap@gmail.com>:
In case anyone's interested, I've taken the CTE-based grouping sets patch from
[1] and made it apply to 9.1, attached. I haven't yet done things like checked
it for whitespace consistency, style conformity, or anything else, but (tuits
permitting) hope to figure out how it works and get it closer to commitability
in some upcoming commitfest.I mention it here so that if someone else is working on it, we can avoid
duplicated effort, and to see if a CTE-based grouping sets implementation is
really the way we think we want to go.
I am plaing with it now :). I have a plan to replace CTE with similar
but explicit executor node. The main issue of existing patch was using
just transformation to CTE. I agree, so it isn't too much extensiable
in future. Now I am cleaning identifiers little bit. Any colaboration
is welcome.
My plan:
1) clean CTE patch
2) replace CTE with explicit executor node, but still based on tuplestore
3) when will be possible parallel processing based on hash agg - then
we don't need to use tuplestore
comments??
Regards
Pavel
Show quoted text
[1] http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxXrggACgkQRiRfCGf1UMMlCQCglaIdtPj8Qe6G60V2LHn5pFNn
kgIAniXRgIQEbVrK/eDVZnmKCzw33lT9
=XVVV
-----END PGP SIGNATURE-----
2010/8/3 Pavel Stehule <pavel.stehule@gmail.com>:
Hello
2010/8/3 Joshua Tolley <eggyknap@gmail.com>:
In case anyone's interested, I've taken the CTE-based grouping sets patch from
[1] and made it apply to 9.1, attached. I haven't yet done things like checked
it for whitespace consistency, style conformity, or anything else, but (tuits
permitting) hope to figure out how it works and get it closer to commitability
in some upcoming commitfest.I mention it here so that if someone else is working on it, we can avoid
duplicated effort, and to see if a CTE-based grouping sets implementation is
really the way we think we want to go.I am plaing with it now :). I have a plan to replace CTE with similar
but explicit executor node. The main issue of existing patch was using
just transformation to CTE. I agree, so it isn't too much extensiable
in future. Now I am cleaning identifiers little bit. Any colaboration
is welcome.My plan:
1) clean CTE patch
2) replace CTE with explicit executor node, but still based on tuplestore
3) when will be possible parallel processing based on hash agg - then
we don't need to use tuplestore
Couldn't you explain what exactly "explicit executor node"? I hope we
can share your image to develop it further than only transformation to
CTE.
Regards,
--
Hitoshi Harada
2010/8/3 Hitoshi Harada <umi.tanuki@gmail.com>:
2010/8/3 Pavel Stehule <pavel.stehule@gmail.com>:
Hello
2010/8/3 Joshua Tolley <eggyknap@gmail.com>:
In case anyone's interested, I've taken the CTE-based grouping sets patch from
[1] and made it apply to 9.1, attached. I haven't yet done things like checked
it for whitespace consistency, style conformity, or anything else, but (tuits
permitting) hope to figure out how it works and get it closer to commitability
in some upcoming commitfest.I mention it here so that if someone else is working on it, we can avoid
duplicated effort, and to see if a CTE-based grouping sets implementation is
really the way we think we want to go.I am plaing with it now :). I have a plan to replace CTE with similar
but explicit executor node. The main issue of existing patch was using
just transformation to CTE. I agree, so it isn't too much extensiable
in future. Now I am cleaning identifiers little bit. Any colaboration
is welcome.My plan:
1) clean CTE patch
2) replace CTE with explicit executor node, but still based on tuplestore
3) when will be possible parallel processing based on hash agg - then
we don't need to use tuplestoreCouldn't you explain what exactly "explicit executor node"? I hope we
can share your image to develop it further than only transformation to
CTE.
I have a one reason
Implementation based on CTE doesn't create space for possible
optimalisations (I think now, maybe it isn't true). It is good for
initial or referencial implementation - but it can be too complex,
when we will try to append some optimalizations - like parallel hash
agg processing, direct data reading without tuplestore. If you are, as
CTE author, thinking so these features are possible in non recursive
CTE too, I am not agains. I hope so this week I'll have a CTE based
patch - and we can talk about next direction. I see as possible
performance issue using a tuplestore - there are lot of cases where
repeating of source query can be faster.
If I remember well, Tom has a objection, so transformation to CTE is
too early - in parser. So It will be first change. Executor node can
be CTE.
regards
Pavel
p.s. I am sure, so there are lot of task, that can be solved together
with non recursive CTE.
Show quoted text
Regards,
--
Hitoshi Harada
On Mon, Aug 02, 2010 at 11:50:00PM -0600, Josh Tolley wrote:
In case anyone's interested, I've taken the CTE-based grouping sets
patch from [1] and made it apply to 9.1, attached. I haven't yet
done things like checked it for whitespace consistency, style
conformity, or anything else, but (tuits permitting) hope to figure
out how it works and get it closer to commitability in some upcoming
commitfest.I mention it here so that if someone else is working on it, we can
avoid duplicated effort, and to see if a CTE-based grouping sets
implementation is really the way we think we want to go.[1]
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php
I've added back one file in the patch enclosed here. I'm still
getting compile fails from
CC="ccache gcc" ./configure --prefix=$PG_PREFIX --with-pgport=$PGPORT --with-perl --with-libxml --enable-debug --enable-cassert
make
Log from that also enclosed.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Aug 03, 2010 at 12:58:03PM -0700, David Fetter wrote:
On Mon, Aug 02, 2010 at 11:50:00PM -0600, Josh Tolley wrote:
In case anyone's interested, I've taken the CTE-based grouping sets
patch from [1] and made it apply to 9.1, attached. I haven't yet
done things like checked it for whitespace consistency, style
conformity, or anything else, but (tuits permitting) hope to figure
out how it works and get it closer to commitability in some upcoming
commitfest.I mention it here so that if someone else is working on it, we can
avoid duplicated effort, and to see if a CTE-based grouping sets
implementation is really the way we think we want to go.[1]
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.phpI've added back one file in the patch enclosed here. I'm still
getting compile fails fromCC="ccache gcc" ./configure --prefix=$PG_PREFIX --with-pgport=$PGPORT --with-perl --with-libxml --enable-debug --enable-cassert
makeLog from that also enclosed.
Yeah, I seem to have done a poor job of producing the patch based on the
repository I was working from. That said, it seems Pavel's working actively on
a patch anyway, so perhaps my updating the old one isn't all that worthwhile.
Pavel, is your code somewhere that we can get to it?
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
2010/8/3 Joshua Tolley <eggyknap@gmail.com>:
On Tue, Aug 03, 2010 at 12:58:03PM -0700, David Fetter wrote:
On Mon, Aug 02, 2010 at 11:50:00PM -0600, Josh Tolley wrote:
In case anyone's interested, I've taken the CTE-based grouping sets
patch from [1] and made it apply to 9.1, attached. I haven't yet
done things like checked it for whitespace consistency, style
conformity, or anything else, but (tuits permitting) hope to figure
out how it works and get it closer to commitability in some upcoming
commitfest.I mention it here so that if someone else is working on it, we can
avoid duplicated effort, and to see if a CTE-based grouping sets
implementation is really the way we think we want to go.[1]
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.phpI've added back one file in the patch enclosed here. I'm still
getting compile fails fromCC="ccache gcc" ./configure --prefix=$PG_PREFIX --with-pgport=$PGPORT --with-perl --with-libxml --enable-debug --enable-cassert
makeLog from that also enclosed.
Yeah, I seem to have done a poor job of producing the patch based on the
repository I was working from. That said, it seems Pavel's working actively on
a patch anyway, so perhaps my updating the old one isn't all that worthwhile.
Pavel, is your code somewhere that we can get to it?
not now. please wait a week.
Regards
Pavel
Show quoted text
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxYeiQACgkQRiRfCGf1UMPlEQCff+I4sCGtR+lzUs6Wb5JKi7Uu
3qYAnjLHzHzyMSHHX55QsphkaBbEJ0Zf
=uRqV
-----END PGP SIGNATURE-----
On Wed, Aug 04, 2010 at 04:44:05AM +0200, Pavel Stehule wrote:
Yeah, I seem to have done a poor job of producing the patch based on the
repository I was working from. That said, it seems Pavel's working actively on
a patch anyway, so perhaps my updating the old one isn't all that worthwhile.
Pavel, is your code somewhere that we can get to it?not now. please wait a week.
That works for me. I'm glad to try doing a better job of putting together my
version of the patch, if anyone thinks it's useful, but it seems that since
Pavel's code is due to appear sometime in the foreseeable future, there's not
much point in my doing that.
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
2010/8/4 Joshua Tolley <eggyknap@gmail.com>:
On Wed, Aug 04, 2010 at 04:44:05AM +0200, Pavel Stehule wrote:
Yeah, I seem to have done a poor job of producing the patch based on the
repository I was working from. That said, it seems Pavel's working actively on
a patch anyway, so perhaps my updating the old one isn't all that worthwhile.
Pavel, is your code somewhere that we can get to it?not now. please wait a week.
That works for me. I'm glad to try doing a better job of putting together my
version of the patch, if anyone thinks it's useful, but it seems that since
Pavel's code is due to appear sometime in the foreseeable future, there's not
much point in my doing that.
I hope, so next week you can do own work on this job - I am not a
native speaker, and my code will need a checking and fixing comments
Regards
Pavel
Show quoted text
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxZkp8ACgkQRiRfCGf1UMMUcwCfcPayQbWRUYwhpCF1f24LsdD9
H/gAnRzCEq6yLX/RVLLi88ROhurOzbhK
=gUPx
-----END PGP SIGNATURE-----
On Thu, Aug 05, 2010 at 06:21:18AM +0200, Pavel Stehule wrote:
I hope, so next week you can do own work on this job - I am not a
native speaker, and my code will need a checking and fixing comments
I haven't entirely figured out how the code in the old patch works, but I
promise I *can* edit comments/docs :)
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
Hello
I am sending a updated version.
i hope so there is more comments, longer and more descriptive
identifiers and I fixed a few bugs. But I found some new bugs :(
What is ok:
create table cars(name varchar, place varchar, count integer);
insert into cars values('skoda', 'czech rep.', 10000);
insert into cars values('skoda', 'germany', 5000);
insert into cars values('bmw', 'czech rep.', 100);
insert into cars values('bmw', 'germany', 1000);
insert into cars values('opel', 'czech rep.', 7000);
insert into cars values('opel', 'germany', 7000);
postgres=# select name, place, sum(count) from cars group by ();
name | place | sum
------+-------+-------
| | 30100
(1 row)
postgres=# select name, place, sum(count) from cars group by cube(name, place);
name | place | sum
-------+------------+-------
bmw | czech rep. | 100
skoda | germany | 5000
opel | czech rep. | 7000
opel | germany | 7000
skoda | czech rep. | 10000
bmw | germany | 1000
bmw | | 1100
skoda | | 15000
opel | | 14000
| germany | 13000
| czech rep. | 17100
| | 30100
(12 rows)
postgres=# select name, place, sum(count) from cars group by grouping
sets(name, place),();
name | place | sum
-------+------------+-------
bmw | | 1100
skoda | | 15000
opel | | 14000
| germany | 13000
| czech rep. | 17100
| | 30100
(6 rows)
postgres=# select name, place, sum(count) from cars group by grouping
sets(name, place,()),();
name | place | sum
-------+------------+-------
bmw | | 1100
skoda | | 15000
opel | | 14000
| germany | 13000
| czech rep. | 17100
| | 30100
(6 rows)
postgres=# select name, place, sum(count), grouping(name) from cars
group by grouping sets(name);
name | place | sum | grouping
-------+-------+-------+----------
bmw | | 1100 | 0
skoda | | 15000 | 0
opel | | 14000 | 0
(3 rows)
what is wrong:
postgres=# select name, place from cars group by ();
name | place
-------+------------
skoda | czech rep.
skoda | germany
bmw | czech rep.
bmw | germany
opel | czech rep.
opel | germany
(6 rows)
have to be NULL, NULL
postgres=# select name, place, sum(count), grouping(name) from cars
group by grouping sets(name) having grouping(name) = 1;
ERROR: unrecognized node type: 934
my rewriting rule is applied too late and maybe isn't optimal. I
replace a grouping(x) by const. maybe is better to use a variable.
Same issue is with ORDER BY clause.
So Joshua, can you look on code?
Regards
Pavel Stehule
2010/8/5 Joshua Tolley <eggyknap@gmail.com>:
Show quoted text
On Thu, Aug 05, 2010 at 06:21:18AM +0200, Pavel Stehule wrote:
I hope, so next week you can do own work on this job - I am not a
native speaker, and my code will need a checking and fixing commentsI haven't entirely figured out how the code in the old patch works, but I
promise I *can* edit comments/docs :)--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxaSjEACgkQRiRfCGf1UMM9dQCZASYJUmXLe5i7L4aQnMicwMfy
cu8An3fMdR/ISezw5YV3KsCAOM+BILO1
=uZb+
-----END PGP SIGNATURE-----
Attachments:
verze002.difftext/x-patch; charset=US-ASCII; name=verze002.diffDownload+1372-64
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote:
So Joshua, can you look on code?
Sure... thanks :)
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
I found other issue :(
postgres=# select name, place from cars group by grouping sets(name, place,());
name | place
-------+------------
bmw |
skoda |
opel |
| germany
| czech rep.
skoda | czech rep.
skoda | germany
bmw | czech rep.
bmw | germany
opel | czech rep.
opel | germany
(11 rows)
postgres=# explain select name, place from cars group by grouping
sets(name, place,());
QUERY PLAN
------------------------------------------------------------------------------
Append (cost=36.98..88.55 rows=1230 width=54)
CTE GroupingSets
-> Seq Scan on cars (cost=0.00..18.30 rows=830 width=68)
-> HashAggregate (cost=18.68..20.68 rows=200 width=32)
-> CTE Scan on "GroupingSets" (cost=0.00..16.60 rows=830 width=32)
-> HashAggregate (cost=18.68..20.68 rows=200 width=32)
-> CTE Scan on "GroupingSets" (cost=0.00..16.60 rows=830 width=32)
-> CTE Scan on "GroupingSets" (cost=0.00..16.60 rows=830 width=64)
(8 rows)
the combination of nonagregates and empty sets do a problems - because
we can't ensure agg mode without aggregates or group by. But it is
only minor issue
2010/8/5 Joshua Tolley <eggyknap@gmail.com>:
Show quoted text
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote:
So Joshua, can you look on code?
Sure... thanks :)
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxa1NsACgkQRiRfCGf1UMPwzQCgjz52P86Yx4ac4aRkKwjn8OHK
6/EAoJ/CjXEyPaLpx39SI5bKQPz+AwBR
=Mi2J
-----END PGP SIGNATURE-----
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote:
I am sending a updated version.
I've been looking at the changes to gram.y, and noted the comment under func_expr
where you added CUBE and ROLLUP definitions. It says that CUBE can't be a
reserved keyword because it's already used in the cube contrib module. But
then the changes to kwlist.h include this:
+ PG_KEYWORD("cube", CUBE, RESERVED_KEYWORD)
...
+ PG_KEYWORD("rollup", ROLLUP, RESERVED_KEYWORD)
...and CUBE and ROLLUP are added in gram.y under the reserved_keyword list. I
realize things like CURRENT_TIME, that also have special entries in the
func_expr grammar, are also reserved keywords, but this all seems at odds with
the comment. What am I missing? Is the comment simply pointing out that the
designation of CUBE and ROLLUP as reserved keywords will have to change at
some point, but it hasn't been implemented yet (or no one has figured out how
to do it)?
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
2010/8/7 Joshua Tolley <eggyknap@gmail.com>:
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote:
I am sending a updated version.
I've been looking at the changes to gram.y, and noted the comment under func_expr
where you added CUBE and ROLLUP definitions. It says that CUBE can't be a
reserved keyword because it's already used in the cube contrib module. But
then the changes to kwlist.h include this:
I am little bit confused now - it's bad comment - and I have to verify
it. What I remember, we cannot to use a two parser's rules, because it
going to a conflict. So there have to be used a trick with a moving to
decision to transform stage, where we have a context info. I have to
recheck a minimal level - probably it can't be a RESERVED_KEYWORD.
Because then we can't to create a function "cube".
Show quoted text
+ PG_KEYWORD("cube", CUBE, RESERVED_KEYWORD) ... + PG_KEYWORD("rollup", ROLLUP, RESERVED_KEYWORD)...and CUBE and ROLLUP are added in gram.y under the reserved_keyword list. I
realize things like CURRENT_TIME, that also have special entries in the
func_expr grammar, are also reserved keywords, but this all seems at odds with
the comment. What am I missing? Is the comment simply pointing out that the
designation of CUBE and ROLLUP as reserved keywords will have to change at
some point, but it hasn't been implemented yet (or no one has figured out how
to do it)?--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxcjSIACgkQRiRfCGf1UMPpCwCcCHBh/1NiLykIcVYgPyfbIegF
xq0AoID75rCPiW8yf29OSkaJVza1FQt5
=PcLs
-----END PGP SIGNATURE-----
Hello
I was confused when I though so I found a solution of 1 shift/reduce conflict :(
All identificators used for buildin functions have to be a
col_name_keywords or reserved keyword. There is conflict with our
(probably obsolete) feature SELECT colname(tabname). So for this
moment the real solution is removing CUBE and ROLLUP from keywords and
dynamically testing a funcname in transformation stage - what is
slower and more ugly.
ideas?
Regards
Pavel Stehule
2010/8/7 Pavel Stehule <pavel.stehule@gmail.com>:
Show quoted text
2010/8/7 Joshua Tolley <eggyknap@gmail.com>:
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote:
I am sending a updated version.
I've been looking at the changes to gram.y, and noted the comment under func_expr
where you added CUBE and ROLLUP definitions. It says that CUBE can't be a
reserved keyword because it's already used in the cube contrib module. But
then the changes to kwlist.h include this:I am little bit confused now - it's bad comment - and I have to verify
it. What I remember, we cannot to use a two parser's rules, because it
going to a conflict. So there have to be used a trick with a moving to
decision to transform stage, where we have a context info. I have to
recheck a minimal level - probably it can't be a RESERVED_KEYWORD.
Because then we can't to create a function "cube".+ PG_KEYWORD("cube", CUBE, RESERVED_KEYWORD) ... + PG_KEYWORD("rollup", ROLLUP, RESERVED_KEYWORD)...and CUBE and ROLLUP are added in gram.y under the reserved_keyword list. I
realize things like CURRENT_TIME, that also have special entries in the
func_expr grammar, are also reserved keywords, but this all seems at odds with
the comment. What am I missing? Is the comment simply pointing out that the
designation of CUBE and ROLLUP as reserved keywords will have to change at
some point, but it hasn't been implemented yet (or no one has figured out how
to do it)?--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkxcjSIACgkQRiRfCGf1UMPpCwCcCHBh/1NiLykIcVYgPyfbIegF
xq0AoID75rCPiW8yf29OSkaJVza1FQt5
=PcLs
-----END PGP SIGNATURE-----
Hello
I found a break in GROUPING SETS implementation. Now I am playing with
own executor and planner node and I can't to go forward :(. Probably
this feature will need a significant update of our agg implementation.
Probably needs a some similar structure like CTE but it can be a
little bit reduced - there are a simple relation between source query
and result query - I am not sure, if this has to be implemented via
subqueries? The second question is relative big differencies between
GROUP BY behave and GROUP BY GROUPING SETS behave. Now I don't know
about way to join GROUP BY and GROUPING SETS together
Any ideas welcome
Regards
Pavel