Hash grouping, aggregates

Started by Bruce Momjianabout 23 years ago8 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

So one of the items on the TODO list is "Add hash for evaluating GROUP BY
aggregates (Tom)"

I'm finding this would benefit a lot of my queries. Most of the time seems to
be going into sorts for group by clauses. I don't know how long it would take
to build a hash of course, but I suspect it would be less than the sort.

Is this something a beginner could figure out? I'm thinking I need a normal
Hash node that builds exactly the same kind of hash as a join, then a HashScan
node that picks all the rows out of the hash.

The neat thing is that hash aggregates would allow grouping on data types that
have = operators but no useful < operator.

(Incidentally, I'm fond of "nested loop", I remember when I was a beginner SQL
programmer looking at plans and it was intuitively obvious what it meant. I
suspect for a beginner looking at "nestloop" it might not be quite so
obvious.)

--
greg

#2Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#1)
Re: Hash grouping, aggregates

Bruno Wolff III <bruno@wolff.to> writes:

This is already in 7.4. You could try it out by building from CVS.
From the HISTORY file:
System can use either hash- or sort-based strategy for grouped
aggregation

Ooh, doing that now. Thanks.

--
greg

#3Bruno Wolff III
bruno@wolff.to
In reply to: Bruce Momjian (#1)
Re: Hash grouping, aggregates

On Tue, Feb 11, 2003 at 09:48:11 -0500,
Greg Stark <gsstark@mit.edu> wrote:

So one of the items on the TODO list is "Add hash for evaluating GROUP BY
aggregates (Tom)"

I'm finding this would benefit a lot of my queries. Most of the time seems to
be going into sorts for group by clauses. I don't know how long it would take
to build a hash of course, but I suspect it would be less than the sort.

Is this something a beginner could figure out? I'm thinking I need a normal
Hash node that builds exactly the same kind of hash as a join, then a HashScan
node that picks all the rows out of the hash.

This is already in 7.4. You could try it out by building from CVS.

From the HISTORY file:

System can use either hash- or sort-based strategy for grouped
aggregation

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Hash grouping, aggregates

Greg Stark <gsstark@mit.edu> writes:

So one of the items on the TODO list is "Add hash for evaluating GROUP BY
aggregates (Tom)"

It's done in CVS tip ... give it a try.

The neat thing is that hash aggregates would allow grouping on data types that
have = operators but no useful < operator.

Hm. Right now I think that would barf on you, because the parser wants
to find the '<' operator to label the grouping column with, even if the
planner later decides not to use it. It'd take some redesign of the
query data structure (specifically SortClause/GroupClause) to avoid that.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Hash grouping, aggregates

Bruno Wolff III <bruno@wolff.to> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

The neat thing is that hash aggregates would allow grouping on data types that
have = operators but no useful < operator.

Hm. Right now I think that would barf on you, because the parser wants
to find the '<' operator to label the grouping column with, even if the
planner later decides not to use it. It'd take some redesign of the
query data structure (specifically SortClause/GroupClause) to avoid that.

I think another issue is that for some = operators you still might not
be able to use a hash. I would expect the discussion for hash joins in
http://developer.postgresql.org/docs/postgres/xoper-optimization.html
would to hash aggregates as well.

Right, the = operator must be hashable or you're out of luck. But we
could imagine tweaking the parser to allow GROUP BY if it finds a
hashable = operator and no sort operator. The only objection I can see
to this is that it means the planner *must* use hash aggregation, which
might be a bad move if there are too many distinct groups.

regards, tom lane

#6Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#4)
Re: Hash grouping, aggregates

On Tue, Feb 11, 2003 at 10:41:53 -0500,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

So one of the items on the TODO list is "Add hash for evaluating GROUP BY
aggregates (Tom)"

It's done in CVS tip ... give it a try.

The neat thing is that hash aggregates would allow grouping on data types that
have = operators but no useful < operator.

Hm. Right now I think that would barf on you, because the parser wants
to find the '<' operator to label the grouping column with, even if the
planner later decides not to use it. It'd take some redesign of the
query data structure (specifically SortClause/GroupClause) to avoid that.

I think another issue is that for some = operators you still might not
be able to use a hash. I would expect the discussion for hash joins in
http://developer.postgresql.org/docs/postgres/xoper-optimization.html
would to hash aggregates as well.

#7Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#5)
Re: Hash grouping, aggregates

Tom Lane kirjutas T, 11.02.2003 kell 18:39:

Bruno Wolff III <bruno@wolff.to> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

The neat thing is that hash aggregates would allow grouping on data types that
have = operators but no useful < operator.

Hm. Right now I think that would barf on you, because the parser wants
to find the '<' operator to label the grouping column with, even if the
planner later decides not to use it. It'd take some redesign of the
query data structure (specifically SortClause/GroupClause) to avoid that.

I think another issue is that for some = operators you still might not
be able to use a hash. I would expect the discussion for hash joins in
http://developer.postgresql.org/docs/postgres/xoper-optimization.html
would to hash aggregates as well.

Right, the = operator must be hashable or you're out of luck. But we
could imagine tweaking the parser to allow GROUP BY if it finds a
hashable = operator and no sort operator. The only objection I can see
to this is that it means the planner *must* use hash aggregation, which
might be a bad move if there are too many distinct groups.

If we run out of sort memory, we can always bail out later, preferrably
with a descriptive error message. It is not as elegant as erring out at
parse (or even plan/optimise) time, but the result is /almost/ the same.

Relying on hash aggregation will become essential if we are ever going
to implement the "other" groupings (CUBE, ROLLUP, (), ...), so it would
be nice if hash aggregation could also overflow to disk - I suspect that
this will still be faster that running an independent scan for each
GROUP BY grouping and merging the results.

-----
Hannu

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#7)
Re: Hash grouping, aggregates

Hannu Krosing <hannu@tm.ee> writes:

Relying on hash aggregation will become essential if we are ever going
to implement the "other" groupings (CUBE, ROLLUP, (), ...), so it would
be nice if hash aggregation could also overflow to disk

I did not make this happen, but it sounds like Joe and Natassa are
about to send their students out to do it ... in a month or so, we
can review the homework assignments and decide who gets an A ;-)

regards, tom lane