Default setting for enable_hashagg_disk
This is just a placeholder thread for an open item that I'm adding to
the Open Items list. We can make a decision later.
Now that we have Disk-based Hash Aggregation, there are a lot more
situations where the planner can choose HashAgg. The
enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it
will fit in work_mem, similar to the old behavior (though it wlil now
spill to disk if the planner was wrong about it fitting in work_mem).
The current default is true.
I expect this to be a win in a lot of cases, obviously. But as with any
planner change, it will be wrong sometimes. We may want to be
conservative and set the default to false depending on the experience
during beta. I'm inclined to leave it as true for now though, because
that will give us better information upon which to base any decision.
Regards,
Jeff Davis
On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:
The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it will fit
in work_mem, similar to the old behavior (though it wlil now spill to disk if
the planner was wrong about it fitting in work_mem). The current default is
true.
Are there any other GUCs that behave like that ? It's confusing to me when I
see "Disk Usage: ... kB", despite setting it to "disable", and without the
usual disable_cost. I realize that postgres chose the plan on the hypothesis
that it would *not* exceed work_mem, and that spilling to disk is considered
preferable to ignoring the setting, and that "going back" to planning phase
isn't a possibility.
template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ;
HashAggregate (actual time=1370.945..2877.250 rows=999999 loops=1)
Group Key: a
Peak Memory Usage: 5017 kB
Disk Usage: 22992 kB
HashAgg Batches: 84
-> Function Scan on generate_series a (actual time=314.507..741.517 rows=999999 loops=1)
A previous version of the docs said this, which I thought was confusing, and you removed it.
But I guess this is the behavior it was trying to .. explain.
+ <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>)
+ ... This only affects the planner choice;
+ execution time may still require using disk-based hash
+ aggregation. The default is <literal>on</literal>.
I suggest that should be reworded and then re-introduced, unless there's some
further behavior change allowing the previous behavior of
might-exceed-work-mem.
"This setting determines whether the planner will elect to use a hash plan
which it expects will exceed work_mem and spill to disk. During execution,
hash nodes which exceed work_mem will spill to disk even if this setting is
disabled. To avoid spilling to disk, either increase work_mem (or set
enable_hashagg=off)."
For sure the release notes should recommend re-calibrating work_mem.
--
Justin
On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote:
On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:
The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it will fit
in work_mem, similar to the old behavior (though it wlil now spill to disk if
the planner was wrong about it fitting in work_mem). The current default is
true.Are there any other GUCs that behave like that ? It's confusing to me when I
see "Disk Usage: ... kB", despite setting it to "disable", and without the
usual disable_cost. I realize that postgres chose the plan on the hypothesis
that it would *not* exceed work_mem, and that spilling to disk is considered
preferable to ignoring the setting, and that "going back" to planning phase
isn't a possibility.
It it really any different from our enable_* GUCs? Even if you do e.g.
enable_sort=off, we may still do a sort. Same for enable_groupagg etc.
template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ;
HashAggregate (actual time=1370.945..2877.250 rows=999999 loops=1)
Group Key: a
Peak Memory Usage: 5017 kB
Disk Usage: 22992 kB
HashAgg Batches: 84
-> Function Scan on generate_series a (actual time=314.507..741.517 rows=999999 loops=1)A previous version of the docs said this, which I thought was confusing, and you removed it.
But I guess this is the behavior it was trying to .. explain.+ <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>) + ... This only affects the planner choice; + execution time may still require using disk-based hash + aggregation. The default is <literal>on</literal>.I suggest that should be reworded and then re-introduced, unless there's some
further behavior change allowing the previous behavior of
might-exceed-work-mem.
Yeah, it would be good to mention this is a best-effort setting.
"This setting determines whether the planner will elect to use a hash plan
which it expects will exceed work_mem and spill to disk. During execution,
hash nodes which exceed work_mem will spill to disk even if this setting is
disabled. To avoid spilling to disk, either increase work_mem (or set
enable_hashagg=off)."For sure the release notes should recommend re-calibrating work_mem.
I don't follow. Why would the recalibrating be needed?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Apr 09, 2020 at 01:48:55PM +0200, Tomas Vondra wrote:
On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote:
On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:
The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it will fit
in work_mem, similar to the old behavior (though it wlil now spill to disk if
the planner was wrong about it fitting in work_mem). The current default is
true.Are there any other GUCs that behave like that ? It's confusing to me when I
see "Disk Usage: ... kB", despite setting it to "disable", and without the
usual disable_cost. I realize that postgres chose the plan on the hypothesis
that it would *not* exceed work_mem, and that spilling to disk is considered
preferable to ignoring the setting, and that "going back" to planning phase
isn't a possibility.It it really any different from our enable_* GUCs? Even if you do e.g.
enable_sort=off, we may still do a sort. Same for enable_groupagg etc.
Those show that the GUC was disabled by showing disable_cost. That's what's
different about this one.
Also.. there's no such thing as enable_groupagg? Unless I've been missing out
on something.
"This setting determines whether the planner will elect to use a hash plan
which it expects will exceed work_mem and spill to disk. During execution,
hash nodes which exceed work_mem will spill to disk even if this setting is
disabled. To avoid spilling to disk, either increase work_mem (or set
enable_hashagg=off)."For sure the release notes should recommend re-calibrating work_mem.
I don't follow. Why would the recalibrating be needed?
Because HashAgg plans which used to run fine (because they weren't prevented
from overflowing work_mem) might now run poorly after spilling to disk (because
of overflowing work_mem).
--
Justin
On Thu, 2020-04-09 at 12:24 -0500, Justin Pryzby wrote:
Also.. there's no such thing as enable_groupagg? Unless I've been
missing out
on something.
I thought about adding that, and went so far as to make a patch. But it
didn't seem right to me -- the grouping isn't what takes the time, it's
the sorting. So what would the point of such a GUC be? To disable
GroupAgg when the input data is already sorted? Or a strange way to
disable Sort?
Because HashAgg plans which used to run fine (because they weren't
prevented
from overflowing work_mem) might now run poorly after spilling to
disk (because
of overflowing work_mem).
It's probably worth a mention in the release notes, but I wouldn't word
it too strongly. Typically the performance difference is not a lot if
the workload still fits in system memory.
Regards,
Jeff Davis
On Thu, Apr 9, 2020 at 7:49 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
It it really any different from our enable_* GUCs? Even if you do e.g.
enable_sort=off, we may still do a sort. Same for enable_groupagg etc.
I think it's actually pretty different. All of the other enable_* GUCs
disable an entire type of plan node, except for cases where that would
otherwise result in planning failure. This just disables a portion of
the planning logic for a certain kind of node, without actually
disabling the whole node type. I'm not sure that's a bad idea, but it
definitely seems to be inconsistent with what we've done in the past.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, 2020-04-09 at 15:26 -0400, Robert Haas wrote:
I think it's actually pretty different. All of the other enable_*
GUCs
disable an entire type of plan node, except for cases where that
would
otherwise result in planning failure. This just disables a portion of
the planning logic for a certain kind of node, without actually
disabling the whole node type. I'm not sure that's a bad idea, but it
definitely seems to be inconsistent with what we've done in the past.
The patch adds two GUCs. Both are slightly weird, to be honest, but let
me explain the reasoning. I am open to other suggestions.
1. enable_hashagg_disk (default true):
This is essentially there just to get some of the old behavior back, to
give people an escape hatch if they see bad plans while we are tweaking
the costing. The old behavior was weird, so this GUC is also weird.
Perhaps we can make this a compatibility GUC that we eventually drop? I
don't necessarily think this GUC would make sense, say, 5 versions from
now. I'm just trying to be conservative because I know that, even if
the plans are faster for 90% of people, the other 10% will be unhappy
and want a way to work around it.
2. enable_groupingsets_hash_disk (default false):
This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.
Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.
With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.
But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?
Regards,
Jeff Davis
On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
2. enable_groupingsets_hash_disk (default false):
This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?
So, I was catching up on email and noticed the last email in this
thread.
I think I am not fully understanding what enable_groupingsets_hash_disk
does. Is it only for testing?
Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes through
hashagg_spill_tuple() code path and has number of batches reported in
Explain) in a MixedAgg plan for a grouping sets query even with
enable_groupingsets_hash_disk set to false. You don't have the exact
query I tried (below) in the test suite, but it is basically what is
already there, so I must be missing something.
set enable_hashagg_disk = true;
SET enable_groupingsets_hash_disk = false;
SET work_mem='64kB';
set enable_hashagg = true;
set jit_above_cost = 0;
drop table if exists gs_hash_1;
create table gs_hash_1 as
select g1000, g100, g10, sum(g::numeric), count(*), max(g::text) from
(select g%1000 as g1000, g%100 as g100, g%10 as g10, g
from generate_series(0,199999) g) s
group by cube (g1000,g100,g10);
explain (analyze, costs off, timing off)
select g1000, g100, g10
from gs_hash_1 group by cube (g1000,g100,g10);
QUERY PLAN
--------------------------------------------------------------
MixedAggregate (actual rows=9648 loops=1)
Hash Key: g10
Hash Key: g10, g1000
Hash Key: g100
Hash Key: g100, g10
Group Key: g1000, g100, g10
Group Key: g1000, g100
Group Key: g1000
Group Key: ()
Peak Memory Usage: 233 kB
Disk Usage: 1600 kB
HashAgg Batches: 2333
-> Sort (actual rows=4211 loops=1)
Sort Key: g1000, g100, g10
Sort Method: external merge Disk: 384kB
-> Seq Scan on gs_hash_1 (actual rows=4211 loops=1)
Anyway, when I throw in the stats trick that is used in join_hash.sql:
alter table gs_hash_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname = 'gs_hash_1';
I get a MixedAgg plan that doesn't have any Sort below and uses much
more disk.
QUERY PLAN
----------------------------------------------------------
MixedAggregate (actual rows=4211 loops=1)
Hash Key: g1000, g100, g10
Hash Key: g1000, g100
Hash Key: g1000
Hash Key: g100, g10
Hash Key: g100
Hash Key: g10, g1000
Hash Key: g10
Group Key: ()
Peak Memory Usage: 405 kB
Disk Usage: 59712 kB
HashAgg Batches: 4209
-> Seq Scan on gs_hash_1 (actual rows=200000 loops=1)
I'm not sure if this is more what you were looking for--or maybe I am
misunderstanding the guc.
--
Melanie Plageman
On Tue, Jun 09, 2020 at 06:20:13PM -0700, Melanie Plageman wrote:
On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
2. enable_groupingsets_hash_disk (default false):
This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?So, I was catching up on email and noticed the last email in this
thread.I think I am not fully understanding what enable_groupingsets_hash_disk
does. Is it only for testing?
If so, it should be in category: "Developer Options".
Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes through
hashagg_spill_tuple() code path and has number of batches reported in
Explain) in a MixedAgg plan for a grouping sets query even with
enable_groupingsets_hash_disk set to false.
I'm not sure if this is more what you were looking for--or maybe I am
misunderstanding the guc.
The behavior of the GUC is inconsistent with the other GUCs, which is
confusing. See also Robert's comments in this thread.
/messages/by-id/20200407223900.GT2228@telsasoft.com
The old (pre-13) behavior was:
- work_mem is the amount of RAM to which each query node tries to constrain
itself, and the planner will reject a plan if it's expected to exceed that.
...But a chosen plan might exceed work_mem anyway.
The new behavior in v13 seems to be:
- HashAgg now respects work_mem, but instead enable*hash_disk are
opportunisitic. A node which is *expected* to spill to disk will be
rejected.
...But at execution time, a node which exceeds work_mem will be spilled.
If someone sees a plan which spills to disk and wants to improve performance by
avoid spilling, they might SET enable_hashagg_disk=off, which might do what
they want (if the plan is rejected at plan time), or it might not, which I
think will be a surprise every time.
If someone agrees, I suggest to add this as an Opened Item.
Maybe some combination of these would be an improvement:
- change documentation to emphasize behavior;
- change EXPLAIN ouput to make it obvious this isn't misbehaving;
- rename the GUC to not start with enable_* (work_mem_exceed?)
- rename the GUC *values* to something other than on/off. On/Planner?
- change the GUC to behave like it sounds like it should, which means "off"
would allow the pre-13 behavior of exceeding work_mem.
- Maybe make it ternary, like:
exceed_work_mem: {spill_disk, planner_reject, allow}
--
Justin
On Tue, Jun 9, 2020 at 7:15 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Jun 09, 2020 at 06:20:13PM -0700, Melanie Plageman wrote:
On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
2. enable_groupingsets_hash_disk (default false):
This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?So, I was catching up on email and noticed the last email in this
thread.I think I am not fully understanding what enable_groupingsets_hash_disk
does. Is it only for testing?If so, it should be in category: "Developer Options".
Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes through
hashagg_spill_tuple() code path and has number of batches reported in
Explain) in a MixedAgg plan for a grouping sets query even with
enable_groupingsets_hash_disk set to false.I'm not sure if this is more what you were looking for--or maybe I am
misunderstanding the guc.The behavior of the GUC is inconsistent with the other GUCs, which is
confusing. See also Robert's comments in this thread.
/messages/by-id/20200407223900.GT2228@telsasoft.comThe old (pre-13) behavior was:
- work_mem is the amount of RAM to which each query node tries to
constrain
itself, and the planner will reject a plan if it's expected to exceed
that.
...But a chosen plan might exceed work_mem anyway.The new behavior in v13 seems to be:
- HashAgg now respects work_mem, but instead enable*hash_disk are
opportunisitic. A node which is *expected* to spill to disk will be
rejected.
...But at execution time, a node which exceeds work_mem will be spilled.If someone sees a plan which spills to disk and wants to improve
performance by
avoid spilling, they might SET enable_hashagg_disk=off, which might do what
they want (if the plan is rejected at plan time), or it might not, which I
think will be a surprise every time.
But I thought that the enable_groupingsets_hash_disk GUC allows us to
test the following scenario:
The following is true:
- planner thinks grouping sets' hashtables table would fit in memory
(spilling is *not* expected)
- user is okay with spilling
- some grouping keys happen to be sortable and some hashable
The following happens:
- Planner generates some HashAgg grouping sets paths
- A MixedAgg plan is created
- During execution of the MixedAgg plan, one or more grouping sets'
hashtables would exceed work_mem, so the executor spills those tuples
to disk instead of exceeding work_mem
Especially given the code and comment:
/*
* If we have sortable columns to work with (gd->rollups is non-empty)
* and enable_groupingsets_hash_disk is disabled, don't generate
* hash-based paths that will exceed work_mem.
*/
if (!enable_groupingsets_hash_disk &&
hashsize > work_mem * 1024L && gd->rollups)
return; /* nope, won't fit */
If this is the scenario that the GUC is designed to test, it seems like
you could exercise it without the enable_groupingsets_hash_disk GUC by
lying about the stats, no?
--
Melanie Plageman
On Tue, 2020-06-09 at 18:20 -0700, Melanie Plageman wrote:
So, I was catching up on email and noticed the last email in this
thread.I think I am not fully understanding what
enable_groupingsets_hash_disk
does. Is it only for testing?
It's mostly for testing. I could imagine cases where it would be useful
to force groupingsets to use the disk, but I mainly wanted the setting
there for testing the grouping sets hash disk code path.
Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes
through
I had something that worked as a test for a while, but then when I
tweaked the costing, it started using the Sort path (therefore not
testing my grouping sets hash disk code at all) and a bug crept in. So
I thought it would be best to have a more forceful knob.
Perhaps I should just get rid of that GUC and use the stats trick?
Regards,
Jeff Davis
On Tue, 2020-06-09 at 21:15 -0500, Justin Pryzby wrote:
The behavior of the GUC is inconsistent with the other GUCs, which is
confusing. See also Robert's comments in this thread.
/messages/by-id/20200407223900.GT2228@telsasoft.com
enable_* GUCs are planner GUCs, so it would be confusing to me if they
affected execution-time behavior.
I think the point of confusion is that it's not enabling/disabling an
entire execution node; it only "disables" HashAgg if it thinks it will
spill. I agree that is a difference with the other GUCs, and could
cause confusion.
Stepping back, I was trying to solve two problems with these GUCs:
1. Testing the spilling of hashed grouping sets: I'm inclined to just
get rid of enable_groupingsets_hash_disk and use Melanie's stats-
hacking approach instead.
2. Trying to provide an escape hatch for someone who experiences a
performance regression and wants something like the old behavior back.
There are two aspects of the old behavior that a user could potentially
want back:
a. Don't choose HashAgg if it's expected to have more groups than fit
into a work_mem-sized hashtable.
b. If executing HashAgg, and the hash table exceeds work_mem, just
keep going.
The behavior in v13 master is, by default, analagous to Sort or
anything else that adapts at runtime to spill. If we had spillable
HashAgg the whole time, we wouldn't be worried about #2 at all. But,
out of conservatism, I am trying to accommodate users who want an
escape hatch, at least for a release or two until users feel more
comfortable with disk-based HashAgg.
Setting enable_hash_disk=false implements 2(a). This name apparently
causes confusion, but it's hard to come up with a better one because
the v12 behavior has nuance that's hard to express succinctly. I don't
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.
In master, there is no explicit way to get 2(b), but you can just set
work_mem higher in a lot of cases. If enough people want 2(b), I can
add it easily. Perhaps hashagg_overflow=on|off, which would control
execution time behavior?
Regards,
Jeff Davis
On Tue, 2020-04-07 at 11:20 -0700, Jeff Davis wrote:
Now that we have Disk-based Hash Aggregation, there are a lot more
situations where the planner can choose HashAgg. The
enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it
will fit in work_mem, similar to the old behavior (though it wlil now
spill to disk if the planner was wrong about it fitting in work_mem).
The current default is true.I expect this to be a win in a lot of cases, obviously. But as with
any
planner change, it will be wrong sometimes. We may want to be
conservative and set the default to false depending on the experience
during beta. I'm inclined to leave it as true for now though, because
that will give us better information upon which to base any decision.
A compromise may be to multiply the disk costs for HashAgg by, e.g. a
1.5 - 2X penalty. That would make the plan changes less abrupt, and may
mitigate some of the concerns about I/O patterns that Tomas raised
here:
/messages/by-id/20200519151202.u2p2gpiawoaznsv2@development
The issues were improved a lot, but it will take us a while to really
tune the IO behavior as well as Sort.
Regards,
Jeff Davis
On Wed, Jun 10, 2020 at 10:39 AM Jeff Davis <pgsql@j-davis.com> wrote:
On Tue, 2020-06-09 at 18:20 -0700, Melanie Plageman wrote:
So, I was catching up on email and noticed the last email in this
thread.I think I am not fully understanding what
enable_groupingsets_hash_disk
does. Is it only for testing?It's mostly for testing. I could imagine cases where it would be useful
to force groupingsets to use the disk, but I mainly wanted the setting
there for testing the grouping sets hash disk code path.Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes
throughI had something that worked as a test for a while, but then when I
tweaked the costing, it started using the Sort path (therefore not
testing my grouping sets hash disk code at all) and a bug crept in. So
I thought it would be best to have a more forceful knob.Perhaps I should just get rid of that GUC and use the stats trick?
I like the idea of doing the stats trick. For extra security, you could
throw in that other trick that is used in groupingsets.sql and make some
of the grouping columns unhashable and some unsortable so you know that
you will not pick only the Sort Path and do just a GroupAgg.
This slight modification of my previous example will probably yield
consistent results:
set enable_hashagg_disk = true;
SET enable_groupingsets_hash_disk = false;
SET work_mem='64kB';
SET enable_hashagg = true;
drop table if exists gs_hash_1;
create table gs_hash_1 as
select g%1000 as g1000, g%100 as g100, g%10 as g10, g,
g::text::xid as g_unsortable, g::bit(4) as g_unhashable
from generate_series(0,199999) g;
analyze gs_hash_1;
alter table gs_hash_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname = 'gs_hash_1';
explain (analyze, costs off, timing off)
select g1000, g100, g10
from gs_hash_1
group by grouping sets ((g1000,g100), (g10, g_unhashable), (g100,
g_unsortable));
QUERY PLAN
----------------------------------------------------------------
MixedAggregate (actual rows=201080 loops=1)
Hash Key: g100, g_unsortable
Group Key: g1000, g100
Sort Key: g10, g_unhashable
Group Key: g10, g_unhashable
Peak Memory Usage: 109 kB
Disk Usage: 13504 kB
HashAgg Batches: 10111
-> Sort (actual rows=200000 loops=1)
Sort Key: g1000, g100
Sort Method: external merge Disk: 9856kB
-> Seq Scan on gs_hash_1 (actual rows=200000 loops=1)
While we are on the topic of the tests, I was wondering if you had
considered making a user defined type that had a lot of padding so that
the tests could use fewer rows. I did this for adaptive hashjoin and it
helped me with iteration time.
I don't know if that would still be the kind of test you are looking for
since a user probably wouldn't have a couple hundred really fat
untoasted tuples, but, I just thought I would check if that would be
useful.
--
Melanie Plageman
On Wed, 2020-06-10 at 11:39 -0700, Jeff Davis wrote:
1. Testing the spilling of hashed grouping sets: I'm inclined to just
get rid of enable_groupingsets_hash_disk and use Melanie's stats-
hacking approach instead.
Fixed in 92c58fd9.
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.
I renamed enable_hashagg_disk to hashagg_avoid_disk_plan, which I think
satisfies the concerns raised here. Also in 92c58fd9.
There is still the original topic of this thread, which is whether we
need to change the default value of this GUC, or penalize disk-based
HashAgg in some way, to be more conservative about plan changes in v13.
I think we can wait a little longer to make a decision there.
Regards,
Jeff Davis
On Thu, Jun 11, 2020 at 01:22:57PM -0700, Jeff Davis wrote:
On Wed, 2020-06-10 at 11:39 -0700, Jeff Davis wrote:
1. Testing the spilling of hashed grouping sets: I'm inclined to just
get rid of enable_groupingsets_hash_disk and use Melanie's stats-
hacking approach instead.Fixed in 92c58fd9.
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.I renamed enable_hashagg_disk to hashagg_avoid_disk_plan, which I think
satisfies the concerns raised here. Also in 92c58fd9.
Thanks for considering :)
I saw you updated the Open Items page, but put the items into "Older Bugs /
Fixed".
I moved them underneath "Resolved" since they're all new in v13.
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34995&oldid=34994
--
Justin
On Thu, 9 Apr 2020 at 13:24, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Apr 09, 2020 at 01:48:55PM +0200, Tomas Vondra wrote:
It it really any different from our enable_* GUCs? Even if you do e.g.
enable_sort=off, we may still do a sort. Same for enable_groupagg etc.Those show that the GUC was disabled by showing disable_cost. That's
what's
different about this one.
Fwiw in the past this was seen not so much as a positive thing but a bug to
be fixed. We've talked about carrying a boolean "disabled plan" flag which
would be treated as a large cost penalty but not actually be added to the
cost in the plan.
The problems with the disable_cost in the cost are (at least):
1) It causes the resulting costs to be useless for comparing the plan costs
with other plans.
2) It can cause other planning decisions to be distorted in strange
non-linear ways.
--
greg
On Thu, Jun 11, 2020 at 01:22:57PM -0700, Jeff Davis wrote:
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.I renamed enable_hashagg_disk to hashagg_avoid_disk_plan, which I think
satisfies the concerns raised here. Also in 92c58fd9.
I think this should be re-arranged to be in alphabetical order
https://www.postgresql.org/docs/devel/runtime-config-query.html
--
Justin
On Wed, Jun 10, 2020 at 2:39 PM Jeff Davis <pgsql@j-davis.com> wrote:
The behavior in v13 master is, by default, analagous to Sort or
anything else that adapts at runtime to spill. If we had spillable
HashAgg the whole time, we wouldn't be worried about #2 at all. But,
out of conservatism, I am trying to accommodate users who want an
escape hatch, at least for a release or two until users feel more
comfortable with disk-based HashAgg.Setting enable_hash_disk=false implements 2(a). This name apparently
causes confusion, but it's hard to come up with a better one because
the v12 behavior has nuance that's hard to express succinctly. I don't
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.In master, there is no explicit way to get 2(b), but you can just set
work_mem higher in a lot of cases. If enough people want 2(b), I can
add it easily. Perhaps hashagg_overflow=on|off, which would control
execution time behavior?
Planner GUCs are a pretty blunt instrument for solving problems that
users may have with planner features. There's no guarantee that the
experience a user has with one query will be the same as the
experience they have with another query, or even that you couldn't
have a single query which contains two different nodes where the
optimal behavior is different for one than it is for the other. In the
first case, changing the value of the GUC on a per-query basis is
pretty painful; in the second case, even that is not good enough. So,
as Tom has said before, the only really good choice in a case like
this is for the planner to figure out the right things automatically;
anything that boils down to a user-provided hint pretty well sucks.
So I feel like the really important thing here is to fix the cases
that don't come out well with default settings. If we can't do that,
then the feature is half-baked and maybe should not have been
committed in the first place. If we can, then we don't really need the
GUC, let alone multiple GUCs. I understand that some of the reason you
added these was out of paranoia, and I get that: it's hard to be sure
that any feature of this complexity isn't going to have some rough
patches, especially given how defective work_mem is as a model in
general. Still, we don't want to end up with 50 planner GUCs enabling
and disabling individual bits of various planner nodes, or at least I
don't think we do, so I'm very skeptical of the idea that we need 2
just for this feature. That doesn't feel scalable. I think the right
number is 0 or 1, and if it's 1, very few people should be changing
the default. If anything else is the case, then IMHO the feature isn't
ready to ship.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jun 22, 2020 at 10:52:37AM -0400, Robert Haas wrote:
On Wed, Jun 10, 2020 at 2:39 PM Jeff Davis <pgsql@j-davis.com> wrote:
The behavior in v13 master is, by default, analagous to Sort or
anything else that adapts at runtime to spill. If we had spillable
HashAgg the whole time, we wouldn't be worried about #2 at all. But,
out of conservatism, I am trying to accommodate users who want an
escape hatch, at least for a release or two until users feel more
comfortable with disk-based HashAgg.Setting enable_hash_disk=false implements 2(a). This name apparently
causes confusion, but it's hard to come up with a better one because
the v12 behavior has nuance that's hard to express succinctly. I don't
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.In master, there is no explicit way to get 2(b), but you can just set
work_mem higher in a lot of cases. If enough people want 2(b), I can
add it easily. Perhaps hashagg_overflow=on|off, which would control
execution time behavior?don't think we do, so I'm very skeptical of the idea that we need 2
just for this feature. That doesn't feel scalable. I think the right
number is 0 or 1, and if it's 1, very few people should be changing
the default. If anything else is the case, then IMHO the feature isn't
ready to ship.
This was addressed in 92c58fd94801dd5c81ee20e26c5bb71ad64552a8
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34994&oldid=34993
--
Justin