Additional size of hash table is alway zero for hash aggregates

Started by Pengzhou Tangalmost 6 years ago12 messages
#1Pengzhou Tang
ptang@pivotal.io
1 attachment(s)

Hi hacker,

When reading the grouping sets codes, I find that the additional size of
the hash table for hash aggregates is always zero, this seems to be
incorrect to me, attached a patch to fix it, please help to check.

Thanks,
Pengzhou

Attachments:

0001-Set-numtrans-correctly-when-building-hash-aggregate-.patchapplication/octet-stream; name=0001-Set-numtrans-correctly-when-building-hash-aggregate-.patchDownload
From 286861adbf418fa91ce357473f56965b9f919af0 Mon Sep 17 00:00:00 2001
From: Pengzhou Tang <ptang@pivotal.io>
Date: Thu, 12 Mar 2020 01:24:32 -0400
Subject: [PATCH] Set numtrans correctly when building hash aggregate tables

aggstate->numtrans is always zero when building hash aggregate
tables, this seems to be incorrect, fix it by delay the building
until the transitions are initialized.
---
 src/backend/executor/nodeAgg.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 7aebb247d8..9d7a35d653 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2520,10 +2520,6 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	{
 		/* this is an array of pointers, not structures */
 		aggstate->hash_pergroup = pergroups;
-
-		find_hash_columns(aggstate);
-		build_hash_tables(aggstate);
-		aggstate->table_filled = false;
 	}
 
 	/*
@@ -2879,6 +2875,14 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 				(errcode(ERRCODE_GROUPING_ERROR),
 				 errmsg("aggregate function calls cannot be nested")));
 
+	/* Initialize hash tables for hash aggregates */
+	if (use_hashing)
+	{
+		find_hash_columns(aggstate);
+		build_hash_tables(aggstate);
+		aggstate->table_filled = false;
+	}
+
 	/*
 	 * Build expressions doing all the transition work at once. We build a
 	 * different one for each phase, as the number of transition function
-- 
2.14.1

#2Andres Freund
andres@anarazel.de
In reply to: Pengzhou Tang (#1)
Re: Additional size of hash table is alway zero for hash aggregates

Hi,

On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:

When reading the grouping sets codes, I find that the additional size of
the hash table for hash aggregates is always zero, this seems to be
incorrect to me, attached a patch to fix it, please help to check.

Indeed, that's incorrect. Causes the number of buckets for the hashtable
to be set higher - the size is just used for that. I'm a bit wary of
changing this in the stable branches - could cause performance changes?

Greetings,

Andres Freund

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#2)
Re: Additional size of hash table is alway zero for hash aggregates

On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:

On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:

When reading the grouping sets codes, I find that the additional size of
the hash table for hash aggregates is always zero, this seems to be
incorrect to me, attached a patch to fix it, please help to check.

Indeed, that's incorrect. Causes the number of buckets for the hashtable
to be set higher - the size is just used for that. I'm a bit wary of
changing this in the stable branches - could cause performance changes?

I found that it was working when Andres implemented TupleHashTable, but broke
at:

| b5635948ab Support hashed aggregation with grouping sets.

So affects v11 and v12. entrysize isn't used for anything else.

--
Justin

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Justin Pryzby (#3)
Re: Additional size of hash table is alway zero for hash aggregates

"Justin" == Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:

Indeed, that's incorrect. Causes the number of buckets for the
hashtable to be set higher - the size is just used for that. I'm a
bit wary of changing this in the stable branches - could cause
performance changes?

I think (offhand, not tested) that the number of buckets would only be
affected if the (planner-supplied) numGroups value would cause work_mem
to be exceeded; the planner doesn't plan a hashagg at all in that case
unless forced to (grouping by a hashable but not sortable column). Note
that for various reasons the planner tends to over-estimate the memory
requirement anyway.

Or maybe if work_mem had been reduced between plan time and execution
time....

So this is unlikely to be causing any issue in practice, so backpatching
may not be called for.

I'll deal with it in HEAD only, unless someone else has a burning desire
to take it.

--
Andrew (irc:RhodiumToad)

#5Andres Freund
andres@anarazel.de
In reply to: Andrew Gierth (#4)
Re: Additional size of hash table is alway zero for hash aggregates

Hi,

On 2020-03-13 00:34:22 +0000, Andrew Gierth wrote:

"Justin" == Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:

Indeed, that's incorrect. Causes the number of buckets for the
hashtable to be set higher - the size is just used for that. I'm a
bit wary of changing this in the stable branches - could cause
performance changes?

I think (offhand, not tested) that the number of buckets would only be
affected if the (planner-supplied) numGroups value would cause work_mem
to be exceeded; the planner doesn't plan a hashagg at all in that case
unless forced to (grouping by a hashable but not sortable column). Note
that for various reasons the planner tends to over-estimate the memory
requirement anyway.

That's a good point.

Or maybe if work_mem had been reduced between plan time and execution
time....

So this is unlikely to be causing any issue in practice, so backpatching
may not be called for.

Sounds sane to me.

I'll deal with it in HEAD only, unless someone else has a burning desire
to take it.

Feel free.

I wonder if we should just remove the parameter though? I'm not sure
there's much point in having it, given it's just callers filling
->additionalstate. And the nbuckets is passed in externally anyway - so
there needs to have been a memory sizing determination previously
anyway? The other users just specify 0 already.

Greetings,

Andres Freund

#6Pengzhou Tang
ptang@pivotal.io
In reply to: Andres Freund (#2)
Re: Additional size of hash table is alway zero for hash aggregates

On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:

When reading the grouping sets codes, I find that the additional size of
the hash table for hash aggregates is always zero, this seems to be
incorrect to me, attached a patch to fix it, please help to check.

Indeed, that's incorrect. Causes the number of buckets for the hashtable
to be set higher - the size is just used for that. I'm a bit wary of
changing this in the stable branches - could cause performance changes?

thanks for confirming this.

#7Pengzhou Tang
ptang@pivotal.io
In reply to: Andrew Gierth (#4)
Re: Additional size of hash table is alway zero for hash aggregates

On Fri, Mar 13, 2020 at 8:34 AM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:

"Justin" == Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:

Indeed, that's incorrect. Causes the number of buckets for the
hashtable to be set higher - the size is just used for that. I'm a
bit wary of changing this in the stable branches - could cause
performance changes?

I think (offhand, not tested) that the number of buckets would only be
affected if the (planner-supplied) numGroups value would cause work_mem
to be exceeded; the planner doesn't plan a hashagg at all in that case
unless forced to (grouping by a hashable but not sortable column). Note
that for various reasons the planner tends to over-estimate the memory
requirement anyway.

That makes sense, thanks

#8Pengzhou Tang
ptang@pivotal.io
In reply to: Andres Freund (#2)
Re: Additional size of hash table is alway zero for hash aggregates

Thanks, Andres Freund and Andres Gierth.

To be related, can I invite you to help to review the parallel grouping sets
patches? It will be very great to hear some comments from you since you
contributed most of the codes for grouping sets.

the thread is
/messages/by-id/CAG4reAQ8rFCc+i0oju3VjaW7xSOJAkvLrqa4F-NYZzAG4SW7iQ@mail.gmail.com

Thanks,
Pengzhou

On Fri, Mar 13, 2020 at 3:16 AM Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:

When reading the grouping sets codes, I find that the additional size of
the hash table for hash aggregates is always zero, this seems to be
incorrect to me, attached a patch to fix it, please help to check.

Indeed, that's incorrect. Causes the number of buckets for the hashtable
to be set higher - the size is just used for that. I'm a bit wary of
changing this in the stable branches - could cause performance changes?

Greetings,

Andres Freund

#9Jeff Davis
pgsql@j-davis.com
In reply to: Andrew Gierth (#4)
Re: Additional size of hash table is alway zero for hash aggregates

On Fri, 2020-03-13 at 00:34 +0000, Andrew Gierth wrote:

"Justin" == Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:

Indeed, that's incorrect. Causes the number of buckets for the
hashtable to be set higher - the size is just used for that. I'm

a

It's also used to set the 'entrysize' field of the TupleHashTable,
which doesn't appear to be used for anything? Maybe we should just
remove that field... it confused me for a moment as I was looking into
this.

bit wary of changing this in the stable branches - could cause
performance changes?

I think (offhand, not tested) that the number of buckets would only
be
affected if the (planner-supplied) numGroups value would cause
work_mem
to be exceeded; the planner doesn't plan a hashagg at all in that

Now that we have Disk-based HashAgg, which already tries to choose the
number of buckets with work_mem in mind; and no other caller specifies
non-zero additionalsize, why not just get rid of that argument
completely? It can still sanity check against work_mem for the sake of
other callers. But it doesn't need 'additionalsize' to do so.

Or, we can keep the 'additionalsize' argument but put it to work store
the AggStatePerGroupData inline in the hash table. That would allow us
to remove the 'additional' pointer from TupleHashEntryData, saving 8
bytes plus the chunk header for every group. That sounds very tempting.

If we want to get even more clever, we could try to squish
AggStatePerGroupData into 8 bytes by putting the flags
(transValueIsNull and noTransValue) into unused bits of the Datum. That
would work if the transtype is by-ref (low bits if pointer will be
unused), or if the type's size is less than 8, or if the particular
aggregate doesn't need either of those booleans. It would get messy,
but saving 8 bytes per group is non-trivial.

Regards,
Jeff Davis

#10Andres Freund
andres@anarazel.de
In reply to: Jeff Davis (#9)
Re: Additional size of hash table is alway zero for hash aggregates

Hi,

On 2020-03-21 17:45:31 -0700, Jeff Davis wrote:

Or, we can keep the 'additionalsize' argument but put it to work store
the AggStatePerGroupData inline in the hash table. That would allow us
to remove the 'additional' pointer from TupleHashEntryData, saving 8
bytes plus the chunk header for every group. That sounds very tempting.

I don't see how? That'd require making the hash bucket addressing deal
with variable sizes, which'd be bad for performance reasons. Since there
can be a aggstate->numtrans AggStatePerGroupDatas for each hash table
entry, I don't see how to avoid a variable size?

If we want to get even more clever, we could try to squish
AggStatePerGroupData into 8 bytes by putting the flags
(transValueIsNull and noTransValue) into unused bits of the Datum.
That would work if the transtype is by-ref (low bits if pointer will
be unused), or if the type's size is less than 8, or if the particular
aggregate doesn't need either of those booleans. It would get messy,
but saving 8 bytes per group is non-trivial.

I'm somewhat doubtful it's worth going for those per-type optimizations
- the wins don't seem large enough, relative to other per-group space
needs. Also adds additional instructions to fetching those values...

If we want to optimize memory usage, I think I'd first go for allocating
the group's "firstTuple" together with all the AggStatePerGroupDatas.

Greetings,

Andres Freund

#11Jeff Davis
pgsql@j-davis.com
In reply to: Andres Freund (#10)
Re: Additional size of hash table is alway zero for hash aggregates

On Sat, 2020-03-21 at 18:26 -0700, Andres Freund wrote:

I don't see how? That'd require making the hash bucket addressing
deal
with variable sizes, which'd be bad for performance reasons. Since
there
can be a aggstate->numtrans AggStatePerGroupDatas for each hash table
entry, I don't see how to avoid a variable size?

It would not vary for a given hash table. Do you mean the compile-time
specialization (of simplehash.h) would not work any more?

If we aren't storing the "additional" inline in the hash entry, I don't
see any purpose for the argument to BuildTupleHashTableExt(), nor the
purpose of the "entrysize" field of TupleHashTableData.

If we want to optimize memory usage, I think I'd first go for
allocating
the group's "firstTuple" together with all the AggStatePerGroupDatas.

That's a good idea.

Regards,
Jeff Davis

#12Andres Freund
andres@anarazel.de
In reply to: Jeff Davis (#11)
Re: Additional size of hash table is alway zero for hash aggregates

Hi,

On 2020-03-23 13:29:02 -0700, Jeff Davis wrote:

On Sat, 2020-03-21 at 18:26 -0700, Andres Freund wrote:

I don't see how? That'd require making the hash bucket addressing
deal
with variable sizes, which'd be bad for performance reasons. Since
there
can be a aggstate->numtrans AggStatePerGroupDatas for each hash table
entry, I don't see how to avoid a variable size?

It would not vary for a given hash table. Do you mean the compile-time
specialization (of simplehash.h) would not work any more?

Yes.

If we aren't storing the "additional" inline in the hash entry, I don't
see any purpose for the argument to BuildTupleHashTableExt(), nor the
purpose of the "entrysize" field of TupleHashTableData.

Yea, that was my conclusion too. It looked like Andrew was going to
commit a fix, but that hasn't happened yet.

Greetings,

Andres Freund