GROUPING

Started by David Fetterover 10 years ago16 messages
#1David Fetter
david@fetter.org

Folks,

While kicking the tires on the new GROUPING() feature, I noticed that
NUMERIC has no cast to bit(n). GROUPING() produces essentially a
bitmap, although the standard mandates for some reason that it be a
numeric type.

I was thinking it should produce NUMERIC rather than int4 as it does
now in order to accommodate large numbers of columns, but the
usefulness of the bitmap is greatly increased if there's a simple CAST
to bit(n).

Contravening the spec, but much more usefully, GROUPING should
probably produce a (possibly ordered) set of key-value pairs.
Alternatively, we could create something like GROUPING_JSON().

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#1)
Re: GROUPING

David Fetter <david@fetter.org> writes:

While kicking the tires on the new GROUPING() feature, I noticed that
NUMERIC has no cast to bit(n). GROUPING() produces essentially a
bitmap, although the standard mandates for some reason that it be a
numeric type.

I was thinking it should produce NUMERIC rather than int4 as it does
now in order to accommodate large numbers of columns, but the
usefulness of the bitmap is greatly increased if there's a simple CAST
to bit(n).

Maybe INT8 would be a better choice than INT4? But I'm not sure there's
any practical use-case for more than 30 grouping sets anyway. Keep in
mind the actual output volume probably grows like 2^N.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: GROUPING

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

I was thinking it should produce NUMERIC rather than int4 as it does
now in order to accommodate large numbers of columns, but the
usefulness of the bitmap is greatly increased if there's a simple
CAST to bit(n).

Tom> Maybe INT8 would be a better choice than INT4? But I'm not sure
Tom> there's any practical use-case for more than 30 grouping sets
Tom> anyway. Keep in mind the actual output volume probably grows like
Tom> 2^N.

Spec says "The declared type of the result is exact numeric with an
implementation-defined precision and a scale of 0 (zero)." for what
that's worth. It doesn't give any hint that I can see for the max
number of columns; it just defines grouping(a...,z) as being equal to
2*grouping(a...) + grouping(z).

But the number of grouping sets isn't really relevant here, rather the
number of columns used for grouping.

In any case, if 31 isn't enough for you, you can call it multiple times:

select ..., grouping(a,b,...,z), grouping(a1,b1,...z1), ...

I didn't think >31 columns would be an issue, but changing it to bigint
is of course trivial if anyone thinks it necessary.

A possibly more interesting question is whether any other db products
have operations like GROUPING() that we could usefully support?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#2)
Re: GROUPING

On 20 May 2015 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Fetter <david@fetter.org> writes:

While kicking the tires on the new GROUPING() feature, I noticed that
NUMERIC has no cast to bit(n). GROUPING() produces essentially a
bitmap, although the standard mandates for some reason that it be a
numeric type.

I was thinking it should produce NUMERIC rather than int4 as it does
now in order to accommodate large numbers of columns, but the
usefulness of the bitmap is greatly increased if there's a simple CAST
to bit(n).

Maybe INT8 would be a better choice than INT4? But I'm not sure there's
any practical use-case for more than 30 grouping sets anyway. Keep in
mind the actual output volume probably grows like 2^N.

Actually using ROLLUP the output volume only grows linearly with N. I
tend to think that having such a large number of grouping sets would
be unlikely, however, it seems wrong to be putting an arbitrary limit
on it that's significantly smaller than the number of columns allowed
in a table.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Dean Rasheed (#4)
Re: GROUPING

"Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Maybe INT8 would be a better choice than INT4? But I'm not sure
there's any practical use-case for more than 30 grouping sets
anyway. Keep in mind the actual output volume probably grows like
2^N.

Dean> Actually using ROLLUP the output volume only grows linearly with
Dean> N. I tend to think that having such a large number of grouping
Dean> sets would be unlikely, however, it seems wrong to be putting an
Dean> arbitrary limit on it that's significantly smaller than the
Dean> number of columns allowed in a table.

Limit on what exactly?

Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
of arguments in a GROUPING() expression is ... 1.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Andrew Gierth (#5)
Re: GROUPING

On 21 May 2015 at 09:20, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

"Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Maybe INT8 would be a better choice than INT4? But I'm not sure
there's any practical use-case for more than 30 grouping sets
anyway. Keep in mind the actual output volume probably grows like
2^N.

Dean> Actually using ROLLUP the output volume only grows linearly with
Dean> N. I tend to think that having such a large number of grouping
Dean> sets would be unlikely, however, it seems wrong to be putting an
Dean> arbitrary limit on it that's significantly smaller than the
Dean> number of columns allowed in a table.

Limit on what exactly?

Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
of arguments in a GROUPING() expression is ... 1.

Actually Oracle haven't quite followed the standard. They have 2
separate functions: GROUPING() which only allows 1 parameter, and
GROUPING_ID() which allows multiple parameters, and returns a bitmask
like our GROUPING() function. However, their GROUPING_ID() function
seems to return an arbitrary precision number and allows an arbitrary
number of parameters (well, I tested it up 70 to prove it wasn't a
64-bit number).

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Dean Rasheed (#6)
Re: GROUPING

"Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
of arguments in a GROUPING() expression is ... 1.

Dean> Actually Oracle haven't quite followed the standard. They have 2
Dean> separate functions: GROUPING() which only allows 1 parameter, and
Dean> GROUPING_ID() which allows multiple parameters, and returns a
Dean> bitmask like our GROUPING() function. However, their
Dean> GROUPING_ID() function seems to return an arbitrary precision
Dean> number and allows an arbitrary number of parameters (well, I
Dean> tested it up 70 to prove it wasn't a 64-bit number).

True. It can handle more than 128 bits, even - I gave up trying after that.

So. Options:

1) change GROUPING() to return bigint and otherwise leave it as is.

2) change GROUPING() to return numeric.

3) change GROUPING() so that the result type varies with the number of
args. I don't see anything in the spec that actually forbids this - it
just says the return type is implementation-defined exact numeric.

A) in addition to any of the above, implement GROUPING_ID() as a simple
alias for GROUPING().

4) leave GROUPING() alone and add a separate GROUPING_ID() with a
different return type.

B) We don't currently have GROUP_ID() - does anyone want it?

*) any other ideas?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#7)
Re: GROUPING

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So. Options:

1) change GROUPING() to return bigint and otherwise leave it as is.

2) change GROUPING() to return numeric.

3) change GROUPING() so that the result type varies with the number of
args. I don't see anything in the spec that actually forbids this - it
just says the return type is implementation-defined exact numeric.

I'd go with (1) --- it's cheap and doesn't lose any capability. If you
do (2), you'd lose useful operations like &, unless you cast back to int
which pretty much defeats the purpose. And (3) is just weird and will
confuse people, especially since numeric and int don't have identical
sets of operations.

Or we could just leave things alone. But (1) seems like pretty cheap
insurance.

A) in addition to any of the above, implement GROUPING_ID() as a simple
alias for GROUPING().

I see no reason to do that. We don't need to copy Oracle-isms; the
language is already complicated enough.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Andrew Gierth (#7)
Re: GROUPING

On 2015-05-21 16:19:27 +0100, Andrew Gierth wrote:

True. It can handle more than 128 bits, even - I gave up trying after that.

So. Options:

1) change GROUPING() to return bigint and otherwise leave it as is.

2) change GROUPING() to return numeric.

3) change GROUPING() so that the result type varies with the number of
args. I don't see anything in the spec that actually forbids this - it
just says the return type is implementation-defined exact numeric.

A) in addition to any of the above, implement GROUPING_ID() as a simple
alias for GROUPING().

4) leave GROUPING() alone and add a separate GROUPING_ID() with a
different return type.

B) We don't currently have GROUP_ID() - does anyone want it?

I'd vote for either 0) do nothing or 1). I think the use case for
specifying 64+ (or even 32+) columns in grouping is pretty darn
slim. And as you said, it's not that hard to work around it if you need
it, and that's only going to be in an automated fashion anyway.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10David Fetter
david@fetter.org
In reply to: Andrew Gierth (#7)
Re: GROUPING

On Thu, May 21, 2015 at 04:19:27PM +0100, Andrew Gierth wrote:

"Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
of arguments in a GROUPING() expression is ... 1.

Dean> Actually Oracle haven't quite followed the standard. They have 2
Dean> separate functions: GROUPING() which only allows 1 parameter, and
Dean> GROUPING_ID() which allows multiple parameters, and returns a
Dean> bitmask like our GROUPING() function. However, their
Dean> GROUPING_ID() function seems to return an arbitrary precision
Dean> number and allows an arbitrary number of parameters (well, I
Dean> tested it up 70 to prove it wasn't a 64-bit number).

True. It can handle more than 128 bits, even - I gave up trying after that.

So. Options:

1) change GROUPING() to return bigint and otherwise leave it as is.

Seems cheap and reasonable. Making sure people know that GROUPING can
be called multiple times seems like another cheap and reasonable
measure.

*) any other ideas?

How about a more sensible data structure as a PG-specific addon.
GROUPING_JSON() seems like just the thing.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: David Fetter (#10)
Re: GROUPING

"David" == David Fetter <david@fetter.org> writes:

David> How about a more sensible data structure as a PG-specific addon.
David> GROUPING_JSON() seems like just the thing.

What exactly do you think it should return?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Gierth (#11)
Re: GROUPING

On Thu, May 21, 2015 at 12:21 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"David" == David Fetter <david@fetter.org> writes:

David> How about a more sensible data structure as a PG-specific addon.
David> GROUPING_JSON() seems like just the thing.

What exactly do you think it should return?

I vote for { "rube" : "goldberg" }.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Fetter (#10)
Re: GROUPING

On 21 May 2015 at 17:15, David Fetter <david@fetter.org> wrote:

On Thu, May 21, 2015 at 04:19:27PM +0100, Andrew Gierth wrote:

"Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
of arguments in a GROUPING() expression is ... 1.

Dean> Actually Oracle haven't quite followed the standard. They have 2
Dean> separate functions: GROUPING() which only allows 1 parameter, and
Dean> GROUPING_ID() which allows multiple parameters, and returns a
Dean> bitmask like our GROUPING() function. However, their
Dean> GROUPING_ID() function seems to return an arbitrary precision
Dean> number and allows an arbitrary number of parameters (well, I
Dean> tested it up 70 to prove it wasn't a 64-bit number).

True. It can handle more than 128 bits, even - I gave up trying after that.

So. Options:

1) change GROUPING() to return bigint and otherwise leave it as is.

Seems cheap and reasonable. Making sure people know that GROUPING can
be called multiple times seems like another cheap and reasonable
measure.

Yeah, seems reasonable. The lack of any bitwise operations on numeric
makes it pretty-much useless in this context.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Andres Freund (#9)
1 attachment(s)
Re: GROUPING

"Andres" == Andres Freund <andres@anarazel.de> writes:

Andres> I'd vote for either 0) do nothing or 1). I think the use case
Andres> for specifying 64+ (or even 32+) columns in grouping is pretty
Andres> darn slim. And as you said, it's not that hard to work around
Andres> it if you need it, and that's only going to be in an automated
Andres> fashion anyway.

If the vote goes with (1), this patch ought to suffice:

--
Andrew (irc:RhodiumToad)

Attachments:

grouping-bigint.patchtext/x-patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 89a609f..e0eeae0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13350,10 +13350,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
        <function>GROUPING(<replaceable class="parameter">args...</replaceable>)</function>
       </entry>
       <entry>
-       <type>integer</type>
+       <type>bigint</type>
       </entry>
       <entry>
-       Integer bitmask indicating which arguments are not being included in the current
+       Bitmask indicating which arguments are not being included in the current
        grouping set
       </entry>
      </row>
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index d414e20..70e9c28 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -3034,10 +3034,10 @@ ExecEvalGroupingFuncExpr(GroupingFuncExprState *gstate,
 						 bool *isNull,
 						 ExprDoneCond *isDone)
 {
-	int result = 0;
-	int attnum = 0;
-	Bitmapset *grouped_cols = gstate->aggstate->grouped_cols;
-	ListCell *lc;
+	int64		result = 0;
+	int			attnum = 0;
+	Bitmapset  *grouped_cols = gstate->aggstate->grouped_cols;
+	ListCell   *lc;
 
 	if (isDone)
 		*isDone = ExprSingleResult;
@@ -3054,7 +3054,7 @@ ExecEvalGroupingFuncExpr(GroupingFuncExprState *gstate,
 			result = result | 1;
 	}
 
-	return (Datum) result;
+	return Int64GetDatum(result);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4176393..baa3303 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -55,7 +55,7 @@ exprType(const Node *expr)
 			type = ((const Aggref *) expr)->aggtype;
 			break;
 		case T_GroupingFunc:
-			type = INT4OID;
+			type = INT8OID;
 			break;
 		case T_WindowFunc:
 			type = ((const WindowFunc *) expr)->wintype;
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 1e3f2e0..8119af5 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -238,10 +238,10 @@ transformGroupingFunc(ParseState *pstate, GroupingFunc *p)
 	List	   *result_list = NIL;
 	GroupingFunc *result = makeNode(GroupingFunc);
 
-	if (list_length(args) > 31)
+	if (list_length(args) > 63)
 		ereport(ERROR,
 				(errcode(ERRCODE_TOO_MANY_ARGUMENTS),
-				 errmsg("GROUPING must have fewer than 32 arguments"),
+				 errmsg("GROUPING must have fewer than 64 arguments"),
 				 parser_errposition(pstate, p->location)));
 
 	foreach(lc, args)
#15David Fetter
david@fetter.org
In reply to: Robert Haas (#12)
Re: GROUPING

On Thu, May 21, 2015 at 12:24:03PM -0400, Robert Haas wrote:

On Thu, May 21, 2015 at 12:21 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"David" == David Fetter <david@fetter.org> writes:

David> How about a more sensible data structure as a PG-specific addon.
David> GROUPING_JSON() seems like just the thing.

What exactly do you think it should return?

I vote for { "rube" : "goldberg" }.

Your point is well taken. I had { "target_list_name" : false, ... }

How about GROUPING_BYTEA()?

Also is there a really great reason that bitwise operations don't work
on NUMERIC? Lack of tuits is a good reason, but not, it seems to me,
a great one.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#15)
Re: GROUPING

David Fetter <david@fetter.org> writes:

Also is there a really great reason that bitwise operations don't work
on NUMERIC? Lack of tuits is a good reason, but not, it seems to me,
a great one.

Not sure that bitwise operations make too much sense on values that
are (a) possibly fractional and (b) inherently decimal not binary.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers