WIP Patch for GROUPING SETS phase 1
This is phase 1 (of either 2 or 3) of implementation of the standard
GROUPING SETS feature, done by Andrew Gierth and myself.
Unlike previous attempts at this feature, we make no attempt to do
any serious work in the parser; we perform some minor syntactic
simplifications described in the spec, such as removing excess parens,
but the original query structure is preserved in views and so on.
So far, we have done most of the actual work in the executor, but
further phases will concentrate on the planner. We have not yet
tackled the hard problem of generating plans that require multiple
passes over the same input data; see below regarding design issues.
What works so far:
- all the standard syntax is accepted (but many combinations are not
plannable yet)
- while the spec only allows column references in GROUP BY, we
continue to allow arbitrary expressions
- grouping sets which can be computed in a single pass over sorted
data (i.e. anything that can be reduced to simple columns plus one
ROLLUP clause, regardless of how it was specified in the query), are
implemented as part of the existing GroupAggregate executor node
- all kinds of aggregate functions, including ordered set functions
and user-defined aggregates, are supported in conjunction with
grouping sets (no API changes, other than one caveat about fn_extra)
- the GROUPING() operation defined in the spec is implemented,
including support for multiple args, and supports arbitrary
expressions as an extension to the spec
Changes/incompatibilities:
- the big compatibility issue: CUBE and ROLLUP are now partially
reserved (col_name_keyword), which breaks contrib/cube. A separate
patch for contrib/ is attached that renames the cube type to "cube"; a
new name really needs to be chosen.
- GROUPING is now a fully reserved word, and SETS is an unreserved keyword
- GROUP BY (a,b) now means GROUP BY a,b (as required by spec).
GROUP BY ROW(a,b) still has the old meaning.
- GROUP BY () is now supported too.
- fn_extra for aggregate calls is per-call-site and NOT
per-transition-value - the same fn_extra will be used for interleaved
calls to the transition function with different transition values.
fn_extra, if used at all, should be used only for per-call-site info
such as data types, as clarified in the 9.4beta changes to the ordered
set function implementation.
Future work:
We envisage that handling of arbitrary grouping sets will be best
done by having the planner generating an Append of multiple
aggregation paths, presumably with some way of moving the original
input path to a CTE. We have not really explored yet how hard this
will be; suggestions are welcome.
In the executor, it is obviously possible to extend HashAggregate to
handle arbitrary collections of grouping sets, but even if the memory
usage issue were solved, this would leave the question of what to do
with non-hashable data types, so it seems that the planner work
probably can't be avoided.
A new name needs to be found for the "cube" data type.
At this point we are more interested in design review rather than
necessarily committing this patch in its current state. However,
committing it may make future work easier; we leave that question
open.
Regards,
Atri
Attachments:
groupingsets_ver1.patchtext/x-diff; charset=US-ASCII; name=groupingsets_ver1.patchDownload+3248-477
On Thu, Aug 14, 2014 at 12:07 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
This is phase 1 (of either 2 or 3) of implementation of the standard GROUPING SETS feature, done by Andrew Gierth and myself.
Unlike previous attempts at this feature, we make no attempt to do any serious work in the parser; we perform some minor syntactic simplifications described in the spec, such as removing excess parens, but the original query structure is preserved in views and so on.
So far, we have done most of the actual work in the executor, but further phases will concentrate on the planner. We have not yet tackled the hard problem of generating plans that require multiple passes over the same input data; see below regarding design issues.
What works so far:
- all the standard syntax is accepted (but many combinations are not plannable yet)
- while the spec only allows column references in GROUP BY, we continue to allow arbitrary expressions
- grouping sets which can be computed in a single pass over sorted data (i.e. anything that can be reduced to simple columns plus one ROLLUP clause, regardless of how it was specified in the query), are implemented as part of the existing GroupAggregate executor node
- all kinds of aggregate functions, including ordered set functions and user-defined aggregates, are supported in conjunction with grouping sets (no API changes, other than one caveat about fn_extra)
- the GROUPING() operation defined in the spec is implemented, including support for multiple args, and supports arbitrary expressions as an extension to the spec
Changes/incompatibilities:
- the big compatibility issue: CUBE and ROLLUP are now partially reserved (col_name_keyword), which breaks contrib/cube. A separate patch for contrib/ is attached that renames the cube type to "cube"; a new name really needs to be chosen.
- GROUPING is now a fully reserved word, and SETS is an unreserved keyword
- GROUP BY (a,b) now means GROUP BY a,b (as required by spec). GROUP BY ROW(a,b) still has the old meaning.
- GROUP BY () is now supported too.
- fn_extra for aggregate calls is per-call-site and NOT per-transition-value - the same fn_extra will be used for interleaved calls to the transition function with different transition values. fn_extra, if used at all, should be used only for per-call-site info such as data types, as clarified in the 9.4beta changes to the ordered set function implementation.
Future work:
We envisage that handling of arbitrary grouping sets will be best done by having the planner generating an Append of multiple aggregation paths, presumably with some way of moving the original input path to a CTE. We have not really explored yet how hard this will be; suggestions are welcome.
In the executor, it is obviously possible to extend HashAggregate to handle arbitrary collections of grouping sets, but even if the memory usage issue were solved, this would leave the question of what to do with non-hashable data types, so it seems that the planner work probably can't be avoided.
A new name needs to be found for the "cube" data type.
At this point we are more interested in design review rather than necessarily committing this patch in its current state. However, committing it may make future work easier; we leave that question open.
Sorry, forgot to attach the patch for fixing cube in contrib, which breaks
since we now reserve "cube" keyword. Please find attached the same.
Regards,
Atri
Attachments:
cube_contribfixes.patchtext/x-diff; charset=US-ASCII; name=cube_contribfixes.patchDownload+1457-1422
On 08/13/2014 09:43 PM, Atri Sharma wrote:
Sorry, forgot to attach the patch for fixing cube in contrib, which breaks
since we now reserve "cube" keyword. Please find attached the same.
Ugh, that will make everyone using the cube extension unhappy. After
this patch, they will have to quote contrib's cube type and functions
every time.
I think we should bite the bullet and rename the extension, and its
"cube" type and functions. For an application, having to suddenly quote
it has the same effect as renaming it; you'll have to find all the
callers and change them. And in the long-run, it's clearly better to
have an unambiguous name.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Heikki" == Heikki Linnakangas <hlinnakangas@vmware.com> writes:
On 08/13/2014 09:43 PM, Atri Sharma wrote:
Sorry, forgot to attach the patch for fixing cube in contrib,
which breaks since we now reserve "cube" keyword. Please find
attached the same.
Heikki> Ugh, that will make everyone using the cube extension
Heikki> unhappy. After this patch, they will have to quote contrib's
Heikki> cube type and functions every time.
Heikki> I think we should bite the bullet and rename the extension,
I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first. (And it
is posted separately from the main patch because of its length and
utter boringness.)
However, even if/when a new name is chosen, there's the question of
how to make the upgrade path easiest. Once CUBE is reserved,
up-to-date pg_dump will quote all uses of the "cube" type and function
when dumping an older database (except inside function bodies of
course), so there may be merit in keeping a "cube" domain over the new
type, and maybe also merit in keeping the extension name.
So what's the new type name going to be? cuboid? hypercube?
geometric_cube? n_dimensional_box?
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
A progress update:
Atri> We envisage that handling of arbitrary grouping sets will be
Atri> best done by having the planner generating an Append of
Atri> multiple aggregation paths, presumably with some way of moving
Atri> the original input path to a CTE. We have not really explored
Atri> yet how hard this will be; suggestions are welcome.
This idea was abandoned.
Instead, we have implemented full support for arbitrary grouping sets
by means of a chaining system:
explain (verbose, costs off) select four, ten, hundred, count(*) from onek group by cube(four,ten,hundred);
QUERY PLAN
-----------------------------------------------------------------------------------------------------
GroupAggregate
Output: four, ten, hundred, count(*)
Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, onek.four), (onek.hundred), ()
-> Sort
Output: four, ten, hundred
Sort Key: onek.hundred, onek.four, onek.ten
-> ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.ten, onek.hundred), (onek.ten)
-> Sort
Output: four, ten, hundred
Sort Key: onek.ten, onek.hundred
-> ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.four, onek.ten), (onek.four)
-> Sort
Output: four, ten, hundred
Sort Key: onek.four, onek.ten
-> Seq Scan on public.onek
Output: four, ten, hundred
(20 rows)
The ChainAggregate nodes use a tuplestore to communicate with the
GroupAggregate node at the top of the chain; they pass through input
tuples unchanged, and write aggregated result rows to the tuplestore,
which the top node then returns once it has finished its own result.
The organization of the planner code seems to be actively hostile to
any attempt to break out new CTEs on the fly, or to plan parts of the
query more than once; the method above seems to be the easiest way to
avoid those issues.
Atri> At this point we are more interested in design review rather
Atri> than necessarily committing this patch in its current state.
This no longer applies; we expect to post within a day or two an
updated patch with full functionality.
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/21/2014 01:28 PM, Andrew Gierth wrote:
A progress update:
Atri> We envisage that handling of arbitrary grouping sets will be
Atri> best done by having the planner generating an Append of
Atri> multiple aggregation paths, presumably with some way of moving
Atri> the original input path to a CTE. We have not really explored
Atri> yet how hard this will be; suggestions are welcome.This idea was abandoned.
Instead, we have implemented full support for arbitrary grouping sets
by means of a chaining system:explain (verbose, costs off) select four, ten, hundred, count(*) from onek group by cube(four,ten,hundred);
QUERY PLAN
-----------------------------------------------------------------------------------------------------
GroupAggregate
Output: four, ten, hundred, count(*)
Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, onek.four), (onek.hundred), ()
-> Sort
Output: four, ten, hundred
Sort Key: onek.hundred, onek.four, onek.ten
-> ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.ten, onek.hundred), (onek.ten)
-> Sort
Output: four, ten, hundred
Sort Key: onek.ten, onek.hundred
-> ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.four, onek.ten), (onek.four)
-> Sort
Output: four, ten, hundred
Sort Key: onek.four, onek.ten
-> Seq Scan on public.onek
Output: four, ten, hundred
(20 rows)
Uh, that's ugly. The EXPLAIN out I mean; as an implementation detail
chaining the nodes might be reasonable. But the above gets unreadable if
you have more than a few grouping sets.
The ChainAggregate nodes use a tuplestore to communicate with the
GroupAggregate node at the top of the chain; they pass through input
tuples unchanged, and write aggregated result rows to the tuplestore,
which the top node then returns once it has finished its own result.
Hmm, so there's a "magic link" between the GroupAggregate at the top and
all the ChainAggregates, via the tuplestore. That may be fine, we have
special rules in passing information between bitmap scan nodes too.
But rather than chain multiple ChainAggregate nodes, how about just
doing all the work in the top GroupAggregate node?
Atri> At this point we are more interested in design review rather
Atri> than necessarily committing this patch in its current state.This no longer applies; we expect to post within a day or two an
updated patch with full functionality.
Ok, cool
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Heikki" == Heikki Linnakangas <hlinnakangas@vmware.com> writes:
Heikki> I think we should bite the bullet and rename the extension,
I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first.
I wonder if you've tried hard enough to avoid reserving the keyword.
I think that the cube extension is not going to be the only casualty
if "cube" becomes a reserved word --- that seems like a name that
could be in use in lots of applications. ("What do you mean, 9.5
breaks our database for tracking office space?") It would be worth
quite a bit of effort to avoid that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Heikki" == Heikki Linnakangas <hlinnakangas@vmware.com> writes:
Heikki> Uh, that's ugly. The EXPLAIN out I mean; as an implementation
Heikki> detail chaining the nodes might be reasonable. But the above
Heikki> gets unreadable if you have more than a few grouping sets.
It's good for highlighting performance issues in EXPLAIN, too.
4096 grouping sets takes about a third of a second to plan and execute,
but something like a minute to generate the EXPLAIN output. However,
for more realistic sizes, plan time is not significant and explain
takes only about 40ms for 256 grouping sets.
(To avoid resource exhaustion issues, we have set a limit of,
currently, 4096 grouping sets per query level. Without such a limit,
it is easy to write queries that would take TBs of memory to parse or
plan. MSSQL and DB2 have similar limits, I'm told.)
The ChainAggregate nodes use a tuplestore to communicate with the
GroupAggregate node at the top of the chain; they pass through input
tuples unchanged, and write aggregated result rows to the tuplestore,
which the top node then returns once it has finished its own result.
Heikki> Hmm, so there's a "magic link" between the GroupAggregate at
Heikki> the top and all the ChainAggregates, via the tuplestore. That
Heikki> may be fine, we have special rules in passing information
Heikki> between bitmap scan nodes too.
Eh. It's far from a perfect solution, but the planner doesn't lend itself
to perfect solutions.
Heikki> But rather than chain multiple ChainAggregate nodes, how
Heikki> about just doing all the work in the top GroupAggregate node?
It was easier this way. (How would you expect to do it all in the top
node when each subset of the grouping sets list needs to see the data
in a different order?)
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-08-21 17:00 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Heikki" == Heikki Linnakangas <hlinnakangas@vmware.com> writes:
Heikki> I think we should bite the bullet and rename the extension,I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first.I wonder if you've tried hard enough to avoid reserving the keyword.
I think that the cube extension is not going to be the only casualty
if "cube" becomes a reserved word --- that seems like a name that
could be in use in lots of applications. ("What do you mean, 9.5
breaks our database for tracking office space?") It would be worth
quite a bit of effort to avoid that.
My prototypes worked without reserved keywords if I remember well
but analyzer is relatively complex
Pavel
Show quoted text
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first.
Tom> I wonder if you've tried hard enough to avoid reserving the keyword.
GROUP BY cube(a,b) is currently legal syntax and means something completely
incompatible to what the spec requires.
GROUP BY GROUPING SETS (cube(a,b), c) -- is that cube(a,b) an expression
to group on, or a list of grouping sets to expand?
GROUP BY (cube(a,b)) -- should that be an error, or silently treat it
as a function call rather than a grouping set? What about GROUP BY
GROUPING SETS ((cube(a,b)) ? (both are errors in our patch)
Accepting those as valid implies a degree of possible confusion that I
personally regard as quite questionable. Previous discussion seemed to
have accepted that contrib/cube was going to have to be renamed.
Tom> I think that the cube extension is not going to be the only
Tom> casualty if "cube" becomes a reserved word --- that seems like a
Tom> name that could be in use in lots of applications. ("What do
Tom> you mean, 9.5 breaks our database for tracking office space?")
Tom> It would be worth quite a bit of effort to avoid that.
It has been a reserved word in the spec since, what, 1999? and it is a
reserved word in mssql, oracle, db2, etc.?
It only needs to be a col_name_keyword, so it still works as a table
or column name (as usual we are less strict than the spec in that
respect). I'm looking into whether it can be made unreserved, but I
have serious doubts about this being a good idea.
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-08-21 17:58 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first.Tom> I wonder if you've tried hard enough to avoid reserving the keyword.
GROUP BY cube(a,b) is currently legal syntax and means something
completely
incompatible to what the spec requires.GROUP BY GROUPING SETS (cube(a,b), c) -- is that cube(a,b) an expression
to group on, or a list of grouping sets to expand?GROUP BY (cube(a,b)) -- should that be an error, or silently treat it
as a function call rather than a grouping set? What about GROUP BY
GROUPING SETS ((cube(a,b)) ? (both are errors in our patch)Accepting those as valid implies a degree of possible confusion that I
personally regard as quite questionable. Previous discussion seemed to
have accepted that contrib/cube was going to have to be renamed.Tom> I think that the cube extension is not going to be the only
Tom> casualty if "cube" becomes a reserved word --- that seems like a
Tom> name that could be in use in lots of applications. ("What do
Tom> you mean, 9.5 breaks our database for tracking office space?")
Tom> It would be worth quite a bit of effort to avoid that.It has been a reserved word in the spec since, what, 1999? and it is a
reserved word in mssql, oracle, db2, etc.?It only needs to be a col_name_keyword, so it still works as a table
or column name (as usual we are less strict than the spec in that
respect). I'm looking into whether it can be made unreserved, but I
have serious doubts about this being a good idea.
+1
contrib module should be renamed - more - current name is confusing against
usual functionality related to words CUBE and ROLLUP
Pavel
Show quoted text
--
Andrew (irc:RhodiumToad)--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I wonder if you've tried hard enough to avoid reserving the keyword.
GROUP BY cube(a,b) is currently legal syntax and means something completely
incompatible to what the spec requires.
Well, if there are any extant applications that use that exact phrasing,
they're going to be broken in any case. That does not mean that we have
to break every other appearance of "cube". I think that special-casing
appearances of cube(...) in GROUP BY lists might be a feasible approach.
Basically, I'm afraid that unilaterally renaming cube is going to break
enough applications that there will be more people who flat out don't
want this patch than there will be who get benefit from it, and we end
up voting to revert the feature altogether. If you'd like to take that
risk then feel free to charge full steam ahead, but don't say you were
not warned. And don't bother arguing that CUBE is reserved according to
the standard, because that will not make one damn bit of difference
to the people who will be unhappy.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 21, 2014 at 1:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I wonder if you've tried hard enough to avoid reserving the keyword.GROUP BY cube(a,b) is currently legal syntax and means something completely
incompatible to what the spec requires.Well, if there are any extant applications that use that exact phrasing,
they're going to be broken in any case. That does not mean that we have
to break every other appearance of "cube". I think that special-casing
appearances of cube(...) in GROUP BY lists might be a feasible approach.Basically, I'm afraid that unilaterally renaming cube is going to break
enough applications that there will be more people who flat out don't
want this patch than there will be who get benefit from it, and we end
up voting to revert the feature altogether. If you'd like to take that
risk then feel free to charge full steam ahead, but don't say you were
not warned. And don't bother arguing that CUBE is reserved according to
the standard, because that will not make one damn bit of difference
to the people who will be unhappy.
I have to respectfully disagree. Certainly, if there is some
reasonable way to not have to change 'cube' then great. But the
tonnage rule applies here: even considering compatibility issues, when
considering the importance of standard SQL (and, I might add,
exceptionally useful) syntax and a niche extension, 'cube' is going to
have to get out of the way. There are view valid reasons to break
compatibility but blocking standard syntax is definitely one of them.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/21/2014 02:48 PM, Merlin Moncure wrote:
Basically, I'm afraid that unilaterally renaming cube is going to break
enough applications that there will be more people who flat out don't
want this patch than there will be who get benefit from it, and we end
up voting to revert the feature altogether. If you'd like to take that
risk then feel free to charge full steam ahead, but don't say you were
not warned. And don't bother arguing that CUBE is reserved according to
the standard, because that will not make one damn bit of difference
to the people who will be unhappy.I have to respectfully disagree. Certainly, if there is some
reasonable way to not have to change 'cube' then great. But the
tonnage rule applies here: even considering compatibility issues, when
considering the importance of standard SQL (and, I might add,
exceptionally useful) syntax and a niche extension, 'cube' is going to
have to get out of the way. There are view valid reasons to break
compatibility but blocking standard syntax is definitely one of them.
I'm inclined to think that the audience for this is far larger than the
audience for the cube extension, which I have not once encountered in
the field.
But I guess we all have different experiences.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
I'm inclined to think that the audience for this is far larger than the
audience for the cube extension, which I have not once encountered in
the field.
Perhaps so. I would really prefer not to have to get into estimating
how many people will be inconvenienced how badly. It's clear to me
that not a lot of sweat has been put into seeing if we can avoid
reserving the keyword, and I think we need to put in that effort.
We've jumped through some pretty high hoops to avoid reserving keywords in
the past, so I don't think this patch should get a free pass on the issue.
Especially considering that renaming the cube extension isn't exactly
going to be zero work: there is no infrastructure for such a thing.
A patch consisting merely of s/cube/foobar/g isn't going to cut it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I'm inclined to think that the audience for this is far larger than the
audience for the cube extension, which I have not once encountered in
the field.
+1
Perhaps so. I would really prefer not to have to get into estimating
how many people will be inconvenienced how badly. It's clear to me
that not a lot of sweat has been put into seeing if we can avoid
reserving the keyword, and I think we need to put in that effort.
I'm with Merlin on this one, it's going to end up happening and I don't
know that 9.5 is any worse than post-9.5 to make this change.
We've jumped through some pretty high hoops to avoid reserving keywords in
the past, so I don't think this patch should get a free pass on the issue.
This doesn't feel like an attempt to get a free pass on anything- it's
not being proposed as fully reserved and there is spec-defined syntax
which needs to be supported. If we can get away with keeping it
unreserved while not making it utterly confusing for users and
convoluting the code, great, but that doesn't seem likely to pan out.
Especially considering that renaming the cube extension isn't exactly
going to be zero work: there is no infrastructure for such a thing.
A patch consisting merely of s/cube/foobar/g isn't going to cut it.
This is a much more interesting challenge to deal with, but perhaps we
could include a perl script or pg_upgrade snippet for users to run to
see if they have the extension installed and to do some magic before the
actual upgrade to handle the rename..?
Thanks,
Stephen
On Thu, Aug 21, 2014 at 06:15:33PM -0400, Stephen Frost wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I'm inclined to think that the audience for this is far larger than the
audience for the cube extension, which I have not once encountered in
the field.+1
I haven't seen it in the field either.
I'd also like to mention that the mere presence of a module in our
contrib/ directory can reflect bad decisions that need reversing just
as easily as it can the presence of vitally important utilities that
need to be preserved. I'm pretty sure the cube extension arrived
after the CUBE keyword in SQL, which makes that an error on our part
if true.
Especially considering that renaming the cube extension isn't
exactly going to be zero work: there is no infrastructure for such
a thing. A patch consisting merely of s/cube/foobar/g isn't going
to cut it.This is a much more interesting challenge to deal with, but perhaps
we could include a perl script or pg_upgrade snippet for users to
run to see if they have the extension installed and to do some magic
before the actual upgrade to handle the rename..?
+1 for doing this. Do we want to make some kind of generator for such
things? It doesn't seem hard in principle, but I haven't tried coding
it up yet.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Stephen" == Stephen Frost <sfrost@snowman.net> writes:
I'm inclined to think that the audience for this is far larger
than the audience for the cube extension, which I have not once
encountered in the field.
Stephen> +1
Most of my encounters with cube have been me suggesting it to people
on IRC as a possible approach for solving certain kinds of performance
problems by converting them to N-dimensional spatial containment
queries. Sometimes this works well, but it doesn't seem to be an
approach that many people discover on their own.
We've jumped through some pretty high hoops to avoid reserving
keywords in the past, so I don't think this patch should get a
free pass on the issue.
Stephen> This doesn't feel like an attempt to get a free pass on
Stephen> anything- it's not being proposed as fully reserved and
Stephen> there is spec-defined syntax which needs to be supported.
Stephen> If we can get away with keeping it unreserved while not
Stephen> making it utterly confusing for users and convoluting the
Stephen> code, great, but that doesn't seem likely to pan out.
Having now spent some more time looking, I believe there is a solution
which makes it unreserved which does not require any significant pain
in the code. I'm not entirely convinced that this is the right
approach in the long term, but it might allow for a more planned
transition.
The absolute minimum seems to be:
GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select
list as a <grouping operation> looks like a function call for any
argument types)
CUBE, ROLLUP, SETS as unreserved_keyword
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Perhaps so. I would really prefer not to have to get into
Tom> estimating how many people will be inconvenienced how badly.
Tom> It's clear to me that not a lot of sweat has been put into
Tom> seeing if we can avoid reserving the keyword, and I think we
Tom> need to put in that effort.
So, first nontrivial issue that crops up is this: if CUBE is
unreserved, then ruleutils will output the string "cube(a,b)" for a
function call to a function named "cube", on the assumption that it
will parse back as a single unit (which inside a GROUP BY is not
true).
Options:
1) when outputting GROUP BY clauses (and nothing else), put parens
around anything that isn't provably atomic; or put parens around
anything that might look like a function call; or put parens around
anything that looks like a function call with a keyword name
2) when outputting any function call, add parens if the name is an
unreserved keyword
3) when outputting any function call, quote the name if it is an
unreserved keyword
4) something else?
(This of course means that if someone has a cube() function call in
a group by clause of a view, then upgrading will change the meaning
of the view and possibly fail to create it; there seems to be no fix
for this, not even using the latest pg_dump, since pg_dump relies on
the old server's ruleutils)
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:
Having now spent some more time looking, I believe there is a solution
which makes it unreserved which does not require any significant pain
in the code. I'm not entirely convinced that this is the right
approach in the long term, but it might allow for a more planned
transition.The absolute minimum seems to be:
GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select
list as a <grouping operation> looks like a function call for any
argument types)CUBE, ROLLUP, SETS as unreserved_keyword
This means
GROUP BY cube(x,y)
GROUP BY (cube(x,y))
GROUP BY cube(x)
all end up with different meanings though, right?
I'm not sure that's really a better situation.
Thanks,
Stephen