Hash support for grouping sets
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.
The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those. (Yes, this
is a knapsack problem.)
This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens. In
addition, support for multiple hash tables in AGG_HASHED mode is added.
Sample plans look like this:
explain select two,ten from onek group by cube(two,ten);
QUERY PLAN
--------------------------------------------------------------
MixedAggregate (cost=0.00..89.33 rows=33 width=8)
Hash Key: two, ten
Hash Key: two
Hash Key: ten
Group Key: ()
-> Seq Scan on onek (cost=0.00..79.00 rows=1000 width=8)
explain select two,ten from onek group by two, cube(ten,twenty);
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=86.50..100.62 rows=162 width=12)
Hash Key: two, ten, twenty
Hash Key: two, ten
Hash Key: two
Hash Key: two, twenty
-> Seq Scan on onek (cost=0.00..79.00 rows=1000 width=12)
set work_mem='64kB';
explain select count(*) from tenk1
group by grouping sets (unique1,thousand,hundred,ten,two);
QUERY PLAN
------------------------------------------------------------------------
MixedAggregate (cost=1535.39..3023.89 rows=11112 width=28)
Hash Key: two
Hash Key: ten
Hash Key: hundred
Group Key: unique1
Sort Key: thousand
Group Key: thousand
-> Sort (cost=1535.39..1560.39 rows=10000 width=20)
Sort Key: unique1
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=20)
(10 rows)
Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.
There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path. This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:
explain select count(*) from tenk1
group by grouping sets (two,unique1,thousand,hundred,ten);
QUERY PLAN
------------------------------------------------------------------------
MixedAggregate (cost=1535.39..4126.28 rows=11112 width=28)
Hash Key: ten
Hash Key: hundred
Group Key: two
Sort Key: unique1
Group Key: unique1
Sort Key: thousand
Group Key: thousand
-> Sort (cost=1535.39..1560.39 rows=10000 width=20)
Sort Key: two
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=20)
(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)
Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested. But it should be reviewable
as-is.
--
Andrew (irc:RhodiumToad)
Attachments:
gshash10.patchtext/x-patchDownload+2365-555
On Thu, Jan 5, 2017 at 10:48 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.
Cool.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The ability to exploit hashed aggregation within sorted groups, when the order of the input stream can be exploited this way, is potentially a useful way to improve aggregation performance more generally. This would potentially be beneficial when the input size is expected to be larger than the amount of working memory available for hashed aggregation, but where there is enough memory to hash-aggregate just the unsorted grouping key combinations, and when the cumulative cost of rebuilding the hash table for each sorted subgroup is less than the cost of sorting the entire input. In other words, if most of the grouping key combinations are already segregated by virtue of the input order, then hashing the remaining combinations within each sorted group might be done in memory, at the cost of rebuilding the hash table for each sorted subgroup.
I haven’t looked at the code for this change yet (I hope I will have the time to do that). Ideally the decision to choose the aggregation method as sorted, hashed, or mixed hash/sort should be integrated into the cost model, but given the notorious difficulty of estimating intermediate cardinalities accurately it would be difficult to develop a cardinality model and a cost model accurate enough to choose among these options consistently well.
Jim Finnerty
Amazon Corp.
On 1/10/17, 10:22 AM, "pgsql-hackers-owner@postgresql.org on behalf of Robert Haas" <pgsql-hackers-owner@postgresql.org on behalf of robertmhaas@gmail.com> wrote:
On Thu, Jan 5, 2017 at 10:48 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.
Cool.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 16, 2017 at 10:59 AM, Finnerty, Jim <jfinnert@amazon.com> wrote:
The ability to exploit hashed aggregation within sorted groups, when the order of the input stream can be exploited this way, is potentially a useful way to improve aggregation performance more generally. This would potentially be beneficial when the input size is expected to be larger than the amount of working memory available for hashed aggregation, but where there is enough memory to hash-aggregate just the unsorted grouping key combinations, and when the cumulative cost of rebuilding the hash table for each sorted subgroup is less than the cost of sorting the entire input. In other words, if most of the grouping key combinations are already segregated by virtue of the input order, then hashing the remaining combinations within each sorted group might be done in memory, at the cost of rebuilding the hash table for each sorted subgroup.
Neat idea.
I haven’t looked at the code for this change yet (I hope I will have the time to do that). Ideally the decision to choose the aggregation method as sorted, hashed, or mixed hash/sort should be integrated into the cost model, but given the notorious difficulty of estimating intermediate cardinalities accurately it would be difficult to develop a cardinality model and a cost model accurate enough to choose among these options consistently well.
Yes, that might be a little tricky.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6 January 2017 at 03:48, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those. (Yes, this
is a knapsack problem.)This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens. In
addition, support for multiple hash tables in AGG_HASHED mode is added.Sample plans look like this:
explain select two,ten from onek group by cube(two,ten);
QUERY PLAN
--------------------------------------------------------------
MixedAggregate (cost=0.00..89.33 rows=33 width=8)
Hash Key: two, ten
Hash Key: two
Hash Key: ten
Group Key: ()
-> Seq Scan on onek (cost=0.00..79.00 rows=1000 width=8)explain select two,ten from onek group by two, cube(ten,twenty);
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=86.50..100.62 rows=162 width=12)
Hash Key: two, ten, twenty
Hash Key: two, ten
Hash Key: two
Hash Key: two, twenty
-> Seq Scan on onek (cost=0.00..79.00 rows=1000 width=12)set work_mem='64kB';
explain select count(*) from tenk1
group by grouping sets (unique1,thousand,hundred,ten,two);
QUERY PLAN
------------------------------------------------------------------------
MixedAggregate (cost=1535.39..3023.89 rows=11112 width=28)
Hash Key: two
Hash Key: ten
Hash Key: hundred
Group Key: unique1
Sort Key: thousand
Group Key: thousand
-> Sort (cost=1535.39..1560.39 rows=10000 width=20)
Sort Key: unique1
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=20)
(10 rows)Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path. This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:explain select count(*) from tenk1
group by grouping sets (two,unique1,thousand,hundred,ten);
QUERY PLAN
------------------------------------------------------------------------
MixedAggregate (cost=1535.39..4126.28 rows=11112 width=28)
Hash Key: ten
Hash Key: hundred
Group Key: two
Sort Key: unique1
Group Key: unique1
Sort Key: thousand
Group Key: thousand
-> Sort (cost=1535.39..1560.39 rows=10000 width=20)
Sort Key: two
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=20)(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested. But it should be reviewable
as-is.
This doesn't apply cleanly to latest master. Could you please post a
rebased patch?
Thanks
Thom
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Thom" == Thom Brown <thom@linux.com> writes:
Thom> This doesn't apply cleanly to latest master. Could you please
Thom> post a rebased patch?
Sure.
--
Andrew (irc:RhodiumToad)
Attachments:
gshash11.patchtext/x-patchDownload+2365-554
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
On my MacBook, `make check-world` gives differences in the contrib modules:
cat contrib/postgres_fdw/regression.diffs
*** /Users/mark/hydra/postgresql.review/contrib/postgres_fdw/expected/postgres_fdw.out 2017-03-03 13:33:47.000000000 -0800
--- /Users/mark/hydra/postgresql.review/contrib/postgres_fdw/results/postgres_fdw.out 2017-03-07 13:27:56.000000000 -0800
***************
*** 3148,3163 ****
-- Grouping sets
explain (verbose, costs off)
select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls last;
! QUERY PLAN
! ---------------------------------------------------------------------------------------------------
! GroupAggregate
! Output: c2, sum(c1)
! Group Key: ft1.c2
! Group Key: ()
! -> Foreign Scan on public.ft1
! Output: c2, c1
! Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3)) ORDER BY c2 ASC NULLS LAST
! (7 rows)
select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls last;
c2 | sum
--- 3148,3166 ----
-- Grouping sets
explain (verbose, costs off)
select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls last;
! QUERY PLAN
! ------------------------------------------------------------------------------
! Sort
! Output: c2, (sum(c1))
! Sort Key: ft1.c2
! -> MixedAggregate
! Output: c2, sum(c1)
! Hash Key: ft1.c2
! Group Key: ()
! -> Foreign Scan on public.ft1
! Output: c2, c1
! Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3))
! (10 rows)
select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls last;
c2 | sum
***************
*** 3170,3185 ****
explain (verbose, costs off)
select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls last;
! QUERY PLAN
! ---------------------------------------------------------------------------------------------------
! GroupAggregate
! Output: c2, sum(c1)
! Group Key: ft1.c2
! Group Key: ()
! -> Foreign Scan on public.ft1
! Output: c2, c1
! Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3)) ORDER BY c2 ASC NULLS LAST
! (7 rows)
select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls last;
c2 | sum
--- 3173,3191 ----
explain (verbose, costs off)
select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls last;
! QUERY PLAN
! ------------------------------------------------------------------------------
! Sort
! Output: c2, (sum(c1))
! Sort Key: ft1.c2
! -> MixedAggregate
! Output: c2, sum(c1)
! Hash Key: ft1.c2
! Group Key: ()
! -> Foreign Scan on public.ft1
! Output: c2, c1
! Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3))
! (10 rows)
select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls last;
c2 | sum
***************
*** 3192,3211 ****
explain (verbose, costs off)
select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) order by 1 nulls last, 2 nulls last;
! QUERY PLAN
! -------------------------------------------------------------------------------------------------------------
Sort
Output: c2, c6, (sum(c1))
Sort Key: ft1.c2, ft1.c6
! -> GroupAggregate
Output: c2, c6, sum(c1)
! Group Key: ft1.c2
! Sort Key: ft1.c6
! Group Key: ft1.c6
-> Foreign Scan on public.ft1
Output: c2, c6, c1
! Remote SQL: SELECT "C 1", c2, c6 FROM "S 1"."T 1" WHERE ((c2 < 3)) ORDER BY c2 ASC NULLS LAST
! (11 rows)
select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) order by 1 nulls last, 2 nulls last;
c2 | c6 | sum
--- 3198,3216 ----
explain (verbose, costs off)
select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) order by 1 nulls last, 2 nulls last;
! QUERY PLAN
! ----------------------------------------------------------------------------------
Sort
Output: c2, c6, (sum(c1))
Sort Key: ft1.c2, ft1.c6
! -> HashAggregate
Output: c2, c6, sum(c1)
! Hash Key: ft1.c2
! Hash Key: ft1.c6
-> Foreign Scan on public.ft1
Output: c2, c6, c1
! Remote SQL: SELECT "C 1", c2, c6 FROM "S 1"."T 1" WHERE ((c2 < 3))
! (10 rows)
select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) order by 1 nulls last, 2 nulls last;
c2 | c6 | sum
======================================================================
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mar 7, 2017, at 1:43 PM, Mark Dilger <hornschnorter@gmail.com> wrote:
On my MacBook, `make check-world` gives differences in the contrib modules:
I get the same (or similar -- didn't check) regression failure on CentOS, so this
doesn't appear to be MacBook / hardware specific.
Mark Dilger
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
On linux/gcc the patch generates a warning in nodeAgg.c that is fairly easy
to fix. Using -Werror to make catching the error easier, I get:
make[3]: Entering directory `/home/mark/postgresql.clean/src/backend/executor'
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -Werror -I../../../src/include -D_GNU_SOURCE -c -o nodeAgg.o nodeAgg.c -MMD -MP -MF .deps/nodeAgg.Po
cc1: warnings being treated as errors
nodeAgg.c: In function ‘ExecAgg’:
nodeAgg.c:2053: error: ‘result’ may be used uninitialized in this function
make[3]: *** [nodeAgg.o] Error 1
make[3]: Leaving directory `/home/mark/postgresql.clean/src/backend/executor'
make[2]: *** [executor-recursive] Error 2
make[2]: Leaving directory `/home/mark/postgresql.clean/src/backend'
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory `/home/mark/postgresql.clean/src'
make: *** [all-src-recurse] Error 2
There are two obvious choices to silence the warning:
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f059560..8959f5b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2066,6 +2066,7 @@ ExecAgg(AggState *node)
break;
case AGG_PLAIN:
case AGG_SORTED:
+ default:
result = agg_retrieve_direct(node);
break;
}
which I like better, because I don't expect it would create
an extra instruction in the compiled code, but also:
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f059560..eab2605 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2050,7 +2050,7 @@ lookup_hash_entries(AggState *aggstate)
TupleTableSlot *
ExecAgg(AggState *node)
{
- TupleTableSlot *result;
+ TupleTableSlot *result = NULL;
if (!node->agg_done)
{
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
Mark> fairly easy to fix. Using -Werror to make catching the error
Mark> easier, I get:
what gcc version is this exactly?
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> On my MacBook, `make check-world` gives differences in the
Mark> contrib modules:
Thanks! Latest cleaned up version of patch is attached.
--
Andrew (irc:RhodiumToad)
Attachments:
gshash23.patchtext/x-patchDownload+2490-583
On Mar 8, 2017, at 2:30 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
Mark> fairly easy to fix. Using -Werror to make catching the error
Mark> easier, I get:what gcc version is this exactly?
Linux version 2.6.32-573.18.1.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Tue Feb 9 22:46:17 UTC 2016
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mar 8, 2017, at 5:47 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> On my MacBook, `make check-world` gives differences in the
Mark> contrib modules:Thanks! Latest cleaned up version of patch is attached.
This fixes both the warning and the contrib tests on linux and mac.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Andrew,
Reviewing the patch a bit more, I find it hard to understand the comment about
passing -1 as a flag for finalize_aggregates. Any chance you can spend a bit
more time word-smithing that code comment?
@@ -1559,7 +1647,9 @@ prepare_projection_slot(AggState *aggstate, TupleTableSlot *slot, int currentSet
/*
* Compute the final value of all aggregates for one group.
*
- * This function handles only one grouping set at a time.
+ * This function handles only one grouping set at a time. But in the hash
+ * case, it's the caller's responsibility to have selected the set already, and
+ * we pass in -1 here to flag that and to control the indexing into pertrans.
*
* Results are stored in the output econtext aggvalues/aggnulls.
*/
@@ -1575,10 +1665,11 @@ finalize_aggregates(AggState *aggstate,
int aggno;
int transno;
- Assert(currentSet == 0 ||
- ((Agg *) aggstate->ss.ps.plan)->aggstrategy != AGG_HASHED);
-
- aggstate->current_set = currentSet;
+ /* ugly hack for hash */
+ if (currentSet >= 0)
+ select_current_set(aggstate, currentSet, false);
+ else
+ currentSet = 0;
On Mar 8, 2017, at 8:00 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
On Mar 8, 2017, at 5:47 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> On my MacBook, `make check-world` gives differences in the
Mark> contrib modules:Thanks! Latest cleaned up version of patch is attached.
This fixes both the warning and the contrib tests on linux and mac.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> Hi Andrew,
Mark> Reviewing the patch a bit more, I find it hard to understand the
Mark> comment about passing -1 as a flag for finalize_aggregates. Any
Mark> chance you can spend a bit more time word-smithing that code
Mark> comment?
Sure.
How does this look for wording (I'll incorporate it into an updated
patch later):
/*
* Compute the final value of all aggregates for one group.
*
* This function handles only one grouping set at a time. But in the hash
* case, it's the caller's responsibility to have selected the set already, and
* we pass in -1 as currentSet here to flag that; this also changes how we
* handle the indexing into AggStatePerGroup as explained below.
*
* Results are stored in the output econtext aggvalues/aggnulls.
*/
static void
finalize_aggregates(AggState *aggstate,
AggStatePerAgg peraggs,
AggStatePerGroup pergroup,
int currentSet)
{
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
Datum *aggvalues = econtext->ecxt_aggvalues;
bool *aggnulls = econtext->ecxt_aggnulls;
int aggno;
int transno;
/*
* If currentSet >= 0, then we're doing sorted grouping, and pergroup is an
* array of size numTrans*numSets which we have to index into using
* currentSet in addition to transno. The caller may not have selected the
* set, so we do that.
*
* If currentSet < 0, then we're doing hashed grouping, and pergroup is an
* array of only numTrans items (since for hashed grouping, each grouping
* set is in a separate hashtable). We rely on the caller having done
* select_current_set, and we fudge currentSet to 0 in order to make the
* same indexing calculations work as for the grouping case.
*/
if (currentSet >= 0)
select_current_set(aggstate, currentSet, false);
else
currentSet = 0;
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mar 8, 2017, at 9:40 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> Hi Andrew,
Mark> Reviewing the patch a bit more, I find it hard to understand the
Mark> comment about passing -1 as a flag for finalize_aggregates. Any
Mark> chance you can spend a bit more time word-smithing that code
Mark> comment?Sure.
How does this look for wording (I'll incorporate it into an updated
patch later):
Much better. Thanks!
/*
* Compute the final value of all aggregates for one group.
*
* This function handles only one grouping set at a time. But in the hash
* case, it's the caller's responsibility to have selected the set already, and
* we pass in -1 as currentSet here to flag that; this also changes how we
* handle the indexing into AggStatePerGroup as explained below.
*
* Results are stored in the output econtext aggvalues/aggnulls.
*/
static void
finalize_aggregates(AggState *aggstate,
AggStatePerAgg peraggs,
AggStatePerGroup pergroup,
int currentSet)
{
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
Datum *aggvalues = econtext->ecxt_aggvalues;
bool *aggnulls = econtext->ecxt_aggnulls;
int aggno;
int transno;/*
* If currentSet >= 0, then we're doing sorted grouping, and pergroup is an
* array of size numTrans*numSets which we have to index into using
* currentSet in addition to transno. The caller may not have selected the
* set, so we do that.
*
* If currentSet < 0, then we're doing hashed grouping, and pergroup is an
* array of only numTrans items (since for hashed grouping, each grouping
* set is in a separate hashtable). We rely on the caller having done
* select_current_set, and we fudge currentSet to 0 in order to make the
* same indexing calculations work as for the grouping case.
*/
if (currentSet >= 0)
select_current_set(aggstate, currentSet, false);
else
currentSet = 0;--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> Hi Andrew,
Mark> Reviewing the patch a bit more, I find it hard to understand the
Mark> comment about passing -1 as a flag for finalize_aggregates. Any
Mark> chance you can spend a bit more time word-smithing that code
Mark> comment?
Actually, ignore that prior response, I'll refactor it a bit to remove
the need for the comment at all.
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> Reviewing the patch a bit more, I find it hard to understand the
Mark> comment about passing -1 as a flag for finalize_aggregates. Any
Mark> chance you can spend a bit more time word-smithing that code
Mark> comment?
Sure.
How does this look for wording (I'll incorporate it into an updated
patch later):
Mark> Much better. Thanks!
Here's a version of the patch that obviates that comment entirely, by
moving the differences between the hashed and sorted cases out to the
separate callers of finalize_aggregates.
--
Andrew (irc:RhodiumToad)
Attachments:
gshash24.patchtext/x-patchDownload+2494-591
Another small update to the patch, this time to eliminate any
possibility of integer overflow when handling extremely large estimated
groupings.
Key change:
- k_weights[i] = (int) floor(sz / scale);
+ /*
+ * If sz is enormous, but work_mem (and hence scale) is
+ * small, avoid integer overflow here.
+ */
+ k_weights[i] = (int) Min(floor(sz / scale),
+ k_capacity + 1.0);
--
Andrew (irc:RhodiumToad)
Attachments:
gshash25.patchtext/x-patchDownload+2503-591
[snip]
This thread seems to have gone quiet - is it time for me to just go
ahead and commit the thing anyway? Anyone else want to weigh in?
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers