Removing "long int"-related limit on hash table sizes
Per the discussion at [1]/messages/by-id/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com, users on Windows are seeing nasty performance
losses in v13/v14 (compared to prior releases) for hash aggregations that
required somewhat more than 2GB in the prior releases. That's because
they spill to disk where they did not before. The easy answer of "raise
hash_mem_multiplier" doesn't help, because on Windows the product of
work_mem and hash_mem_multiplier is clamped to 2GB, thanks to the ancient
decision to do a lot of memory-space-related calculations in "long int",
which is only 32 bits on Win64.
While I don't personally have the interest to fix that altogether,
it does seem like we've got a performance regression that we ought
to do something about immediately. So I took a look at getting rid of
this restriction for calculations associated with hash_mem_multiplier,
and it doesn't seem to be too bad. I propose the attached patch.
(This is against HEAD; there are minor conflicts in v13 and v14.)
A couple of notes:
* I did not change most of the comments referring to "hash_mem",
even though that's not really a thing anymore. They seem readable
enough anyway, and I failed to think of a reasonably-short substitute.
* We should drop get_hash_mem() altogether in HEAD and maybe v14.
I figure we'd better leave it available in v13, though, in case
any outside code is using it.
Comments?
regards, tom lane
[1]: /messages/by-id/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
Attachments:
fix-long-int-calcs-for-hash-mem.patchtext/x-diff; charset=us-ascii; name=fix-long-int-calcs-for-hash-mem.patchDownload+173-108
Hi,
On 2021-07-23 17:15:24 -0400, Tom Lane wrote:
Per the discussion at [1], users on Windows are seeing nasty performance
losses in v13/v14 (compared to prior releases) for hash aggregations that
required somewhat more than 2GB in the prior releases.
Ugh :(.
That's because they spill to disk where they did not before. The easy
answer of "raise hash_mem_multiplier" doesn't help, because on Windows
the product of work_mem and hash_mem_multiplier is clamped to 2GB,
thanks to the ancient decision to do a lot of memory-space-related
calculations in "long int", which is only 32 bits on Win64.
We really ought to just remove every single use of long. As Thomas
quipped on twitter at some point, "long is the asbestos of C". I think
we've incurred far more cost due to weird workarounds to deal with the
difference in long width between windows and everything else, than just
removing all use of it outright would incur.
And perhaps once we've done that, we shoulde experiment with putting
__attribute__((deprecated)) on long, but conditionalize it so it's only
used for building PG internal stuff, and doesn't leak into pg_config
output. Perhaps it'll be to painful due to external headers, but it
seems worth trying.
But obviously that doesn't help with the issue in the release branches.
While I don't personally have the interest to fix that altogether,
it does seem like we've got a performance regression that we ought
to do something about immediately. So I took a look at getting rid of
this restriction for calculations associated with hash_mem_multiplier,
and it doesn't seem to be too bad. I propose the attached patch.
(This is against HEAD; there are minor conflicts in v13 and v14.)
Hm. I wonder if we would avoid some overflow dangers on 32bit systems if
we made get_hash_memory_limit() and the relevant variables 64 bit,
rather than 32bit / size_t. E.g.
@@ -700,9 +697,9 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
inner_rel_bytes = ntuples * tupsize;/* - * Target in-memory hashtable size is hash_mem kilobytes. + * Compute in-memory hashtable size limit from GUCs. */ - hash_table_bytes = hash_mem * 1024L; + hash_table_bytes = get_hash_memory_limit();/* * Parallel Hash tries to use the combined hash_mem of all workers to @@ -710,7 +707,14 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew, * per worker and tries to process batches in parallel. */ if (try_combined_hash_mem) - hash_table_bytes += hash_table_bytes * parallel_workers; + { + /* Careful, this could overflow size_t */ + double newlimit; + + newlimit = (double) hash_table_bytes * (double) (parallel_workers + 1); + newlimit = Min(newlimit, (double) SIZE_MAX); + hash_table_bytes = (size_t) newlimit; + }
Wouldn't need to be as carful, I think?
@@ -740,12 +747,26 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew, * size of skew bucket struct itself *---------- */ - *num_skew_mcvs = skew_table_bytes / (tupsize + - (8 * sizeof(HashSkewBucket *)) + - sizeof(int) + - SKEW_BUCKET_OVERHEAD); - if (*num_skew_mcvs > 0) - hash_table_bytes -= skew_table_bytes; + bytes_per_mcv = tupsize + + (8 * sizeof(HashSkewBucket *)) + + sizeof(int) + + SKEW_BUCKET_OVERHEAD; + skew_mcvs = hash_table_bytes / bytes_per_mcv; + + /* + * Now scale by SKEW_HASH_MEM_PERCENT (we do it in this order so as + * not to worry about size_t overflow in the multiplication) + */ + skew_mcvs = skew_mcvs * SKEW_HASH_MEM_PERCENT / 100;
I always have to think about the evaluation order of things like this
(it's left to right for these), so I'd prefer parens around the
multiplication. I did wonder briefly whether the SKEW_HASH_MEM_PERCENT /
100 just evaluates to 0...
Looks like a good idea to me.
Greetings,
Andres Freund
On Sat, Jul 24, 2021 at 06:25:53PM -0700, Andres Freund wrote:
That's because they spill to disk where they did not before. The easy
answer of "raise hash_mem_multiplier" doesn't help, because on Windows
the product of work_mem and hash_mem_multiplier is clamped to 2GB,
thanks to the ancient decision to do a lot of memory-space-related
calculations in "long int", which is only 32 bits on Win64.We really ought to just remove every single use of long. As Thomas
quipped on twitter at some point, "long is the asbestos of C". I think
we've incurred far more cost due to weird workarounds to deal with the
difference in long width between windows and everything else, than just
removing all use of it outright would incur.
+1
As I understand it, making long of undermined length was to allow
someone to choose a data type that _might_ be longer than int if the
compiler/OS/CPU was optimized for that, but at this point, such
optimizations just don't seem to make sense, and we know every(?) CPU
supports long-long, so why not go for something concrete? Do we really
want our feature limits to be determined by whether we have an optimized
type longer than int?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Andres Freund <andres@anarazel.de> writes:
On 2021-07-23 17:15:24 -0400, Tom Lane wrote:
That's because they spill to disk where they did not before. The easy
answer of "raise hash_mem_multiplier" doesn't help, because on Windows
the product of work_mem and hash_mem_multiplier is clamped to 2GB,
thanks to the ancient decision to do a lot of memory-space-related
calculations in "long int", which is only 32 bits on Win64.
We really ought to just remove every single use of long.
I have no objection to that as a long-term goal. But I'm not volunteering
to do all the work, and in any case it wouldn't be a back-patchable fix.
I feel that we do need to do something about this performance regression
in v13.
Hm. I wonder if we would avoid some overflow dangers on 32bit systems if
we made get_hash_memory_limit() and the relevant variables 64 bit,
rather than 32bit / size_t. E.g.
No, I don't like that. Using size_t for memory-size variables is good
discipline. Moreover, I'm not convinced that even with 64-bit ints,
overflow would be impossible in all the places I fixed here. They're
multiplying several potentially very large values (one of which
is a float). I think this is just plain sloppy coding, independently
of which bit-width you choose to be sloppy in.
+ skew_mcvs = skew_mcvs * SKEW_HASH_MEM_PERCENT / 100;
I always have to think about the evaluation order of things like this
(it's left to right for these), so I'd prefer parens around the
multiplication. I did wonder briefly whether the SKEW_HASH_MEM_PERCENT /
100 just evaluates to 0...
OK, will do. I see your point, because I'd sort of instinctively
wanted to write that as
skew_mcvs *= SKEW_HASH_MEM_PERCENT / 100;
which of course would not work.
Thanks for looking at the code.
regards, tom lane
Em dom., 25 de jul. de 2021 às 13:28, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Andres Freund <andres@anarazel.de> writes:
On 2021-07-23 17:15:24 -0400, Tom Lane wrote:
That's because they spill to disk where they did not before. The easy
answer of "raise hash_mem_multiplier" doesn't help, because on Windows
the product of work_mem and hash_mem_multiplier is clamped to 2GB,
thanks to the ancient decision to do a lot of memory-space-related
calculations in "long int", which is only 32 bits on Win64.We really ought to just remove every single use of long.
I have no objection to that as a long-term goal. But I'm not volunteering
to do all the work, and in any case it wouldn't be a back-patchable fix.
I'm a volunteer, if you want to work together.
I think int64 is in most cases the counterpart of *long* on Windows.
regards,
Ranier Vilela
Ranier Vilela <ranier.vf@gmail.com> writes:
I think int64 is in most cases the counterpart of *long* on Windows.
I'm not particularly on board with s/long/int64/g as a universal
solution. I think that most of these usages are concerned with
memory sizes and would be better off as "size_t". We might need
int64 in places where we're concerned with sums of memory usage
across processes, or where the value needs to be allowed to be
negative. So it'll take case-by-case analysis to do it right.
BTW, one aspect of this that I'm unsure how to tackle is the
common usage of "L" constants; in particular, "work_mem * 1024L"
is a really common idiom that we'll need to get rid of. Not sure
that grep will be a useful aid for finding those.
regards, tom lane
Em dom., 25 de jul. de 2021 às 15:53, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Ranier Vilela <ranier.vf@gmail.com> writes:
I think int64 is in most cases the counterpart of *long* on Windows.
I'm not particularly on board with s/long/int64/g as a universal
solution.
Sure, not a universal solution, I mean a start point.
When I look for a type that is signed and size 8 bytes in Windows, I only
see int64.
I think that most of these usages are concerned with
memory sizes and would be better off as "size_t".
Ok, but let's not forget that size_t is unsigned.
We might need
int64 in places where we're concerned with sums of memory usage
across processes, or where the value needs to be allowed to be
negative. So it'll take case-by-case analysis to do it right.
Sure.
BTW, one aspect of this that I'm unsure how to tackle is the
common usage of "L" constants; in particular, "work_mem * 1024L"
is a really common idiom that we'll need to get rid of. Not sure
that grep will be a useful aid for finding those.
I can see 30 matches in the head tree. (grep -d "1024L" *.c)
File backend\access\gin\ginfast.c:
if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize *
1024L)
(accum.allocatedMemory >= workMemory * 1024L)))
Is it a good point to start?
or one more simple?
(src/backend/access/hash/hash.c) has one *long*.
regards,
Ranier Vilela
On Sun, Jul 25, 2021 at 12:28:04PM -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
We really ought to just remove every single use of long.
I have no objection to that as a long-term goal. But I'm not volunteering
to do all the work, and in any case it wouldn't be a back-patchable fix.
I feel that we do need to do something about this performance regression
in v13.
Another idea may be to be more aggressive in c.h? A tweak there would
be dirtier than marking long as deprecated, but that would be less
invasive. Any of that is not backpatchable, of course..
No, I don't like that. Using size_t for memory-size variables is good
discipline. Moreover, I'm not convinced that even with 64-bit ints,
overflow would be impossible in all the places I fixed here. They're
multiplying several potentially very large values (one of which
is a float). I think this is just plain sloppy coding, independently
of which bit-width you choose to be sloppy in.
Yeah, using size_t where adapted is usually a good idea.
--
Michael
On 2021-Jul-25, Ranier Vilela wrote:
BTW, one aspect of this that I'm unsure how to tackle is the
common usage of "L" constants; in particular, "work_mem * 1024L"
is a really common idiom that we'll need to get rid of. Not sure
that grep will be a useful aid for finding those.I can see 30 matches in the head tree. (grep -d "1024L" *.c)
grep grep '[0-9]L\>' -- *.[chyl]
shows some more constants.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2021-Jul-25, Ranier Vilela wrote:
BTW, one aspect of this that I'm unsure how to tackle is the
common usage of "L" constants; in particular, "work_mem * 1024L"
is a really common idiom that we'll need to get rid of. Not sure
that grep will be a useful aid for finding those.I can see 30 matches in the head tree. (grep -d "1024L" *.c)
grep grep '[0-9]L\>' -- *.[chyl]
shows some more constants.
git grep -Eiw '(0x[0-9a-f]+|[0-9]+)U?LL?' -- *.[chyl]
gives about a hundred more hits.
We also have the (U)INT64CONST() macros, which are about about two
thirds as common as the U?LL? suffixes.
- ilmari
ilmari@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
We also have the (U)INT64CONST() macros, which are about about two
thirds as common as the U?LL? suffixes.
Yeah. Ideally we'd forbid direct use of the suffixes and insist
you go through those macros, but I don't know of any way that
we could enforce such a coding rule, short of grepping the tree
periodically.
regards, tom lane
On 2021-Jul-26, Tom Lane wrote:
ilmari@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
We also have the (U)INT64CONST() macros, which are about about two
thirds as common as the U?LL? suffixes.Yeah. Ideally we'd forbid direct use of the suffixes and insist
you go through those macros, but I don't know of any way that
we could enforce such a coding rule, short of grepping the tree
periodically.
IIRC we have one buildfarm member that warns us about perlcritic; maybe
this is just another setup of that sort.
(Personally I run the perlcritic check in my local commit-verifying
script before pushing.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"XML!" Exclaimed C++. "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/
Hi,
On 2021-07-26 11:38:41 +0900, Michael Paquier wrote:
On Sun, Jul 25, 2021 at 12:28:04PM -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
We really ought to just remove every single use of long.
I have no objection to that as a long-term goal. But I'm not volunteering
to do all the work, and in any case it wouldn't be a back-patchable fix.
I feel that we do need to do something about this performance regression
in v13.Another idea may be to be more aggressive in c.h? A tweak there would
be dirtier than marking long as deprecated, but that would be less
invasive. Any of that is not backpatchable, of course..
Hard to see how that could work - plenty system headers use long...
Greetings,
Andres Freund