init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets
If nbuckets is around 2^30+1, which can happen if work_mem is very high,
then:
nbuckets = 1 << my_log2(lnbuckets);
ends up as 1 << 31, which is negative, leading to a SIGFPE on my
machine, but I think it can also lead to an infinite loop or a crash
(after corrupting the HASHHDR).
The only simple way I can reproduce this is with gdb:
1. attach gdb to a session
2. set a breakpoint in ExecInitRecursiveUnion and continue
3. execute in session:
set work_mem='100GB';
with recursive r (i) as
(select 1 union select i+1 from r where i < 10)
select * from r;
4. (gdb) set node->numGroups = (1 << 30) + 1
5. (gdb) continue
I think we should just cap nbuckets at 1 << 30 in init_htab.
There was a previous fix here:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=299d1716525c659f0e02840e31fbe4dea3
But it assumed that init_htab could deal with INT_MAX. In practice,
work_mem will usually be the limiting factor anyway, but not if it's set
high.
Regards,
Jeff Davis
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Davis <pgsql@j-davis.com> writes:
If nbuckets is around 2^30+1, which can happen if work_mem is very high,
then:
nbuckets = 1 << my_log2(lnbuckets);
ends up as 1 << 31, which is negative, leading to a SIGFPE on my
machine, but I think it can also lead to an infinite loop or a crash
(after corrupting the HASHHDR).
...
I think we should just cap nbuckets at 1 << 30 in init_htab.
Yeah. After looking at other uses of my_log2, it seems like
hash_estimate_size and hash_select_dirsize probably should also
bound their inputs to avoid overflow of 1 << my_log2() calculations.
Alternatively, maybe we should hack my_log2 to do that bounding.
As-is, it seems like trouble waiting to happen. This won't protect
init_htab, which needs the shift result to fit in int not long.
But my_log2 is just plain broken for inputs larger than LONG_MAX/2,
anyway.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Sun, 2012-12-09 at 17:45 -0500, Tom Lane wrote:
Yeah. After looking at other uses of my_log2, it seems like
hash_estimate_size and hash_select_dirsize probably should also
bound their inputs to avoid overflow of 1 << my_log2() calculations.Alternatively, maybe we should hack my_log2 to do that bounding.
As-is, it seems like trouble waiting to happen. This won't protect
init_htab, which needs the shift result to fit in int not long.
But my_log2 is just plain broken for inputs larger than LONG_MAX/2,
anyway.
It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).
Then, the caller can bound the result further if needed, which should be
less error-prone, because the caller would see that it returns a long
(and compiler, but I don't seem to get a warning for implicit long->int
conversions).
Both of the callers that actually want a log2 already assume that the
input is a power of two, so we can redefine my_log2 to require it.
Regards,
Jeff Davis
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Davis <pgsql@j-davis.com> writes:
On Sun, 2012-12-09 at 17:45 -0500, Tom Lane wrote:
Alternatively, maybe we should hack my_log2 to do that bounding.
As-is, it seems like trouble waiting to happen. This won't protect
init_htab, which needs the shift result to fit in int not long.
But my_log2 is just plain broken for inputs larger than LONG_MAX/2,
anyway.
It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).
Then, the caller can bound the result further if needed, which should be
less error-prone, because the caller would see that it returns a long
(and compiler, but I don't seem to get a warning for implicit long->int
conversions).
Both of the callers that actually want a log2 already assume that the
input is a power of two, so we can redefine my_log2 to require it.
It seems reasonably possible that add-on code is using my_log2, so I'm
hesitant to change the function's signature, or to change its behavior
more than minimally necessary --- especially in the back branches.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Davis <pgsql@j-davis.com> writes:
It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).
That does seem like a good idea. We need one for an int-sized result
too, to fix the original problem in init_htab. So I propose these
functions:
/* calculate ceil(log base 2) of num */
int
my_log2(long num)
{
int i;
long limit;
/* guard against too-large input, which would put us into infinite loop */
if (num > LONG_MAX / 2)
num = LONG_MAX / 2;
for (i = 0, limit = 1; limit < num; i++, limit <<= 1)
;
return i;
}
/* calculate first power of 2 >= num, bounded to what will fit in a long */
long
next_power_of_two_long(long num)
{
/* my_log2's internal range check is sufficient */
return 1L << my_log2(num);
}
/* calculate first power of 2 >= num, bounded to what will fit in an int */
int
next_power_of_two_int(long num)
{
if (num > INT_MAX / 2)
num = INT_MAX / 2;
return 1 << my_log2(num);
}
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, 2012-12-10 at 20:27 -0500, Tom Lane wrote:
Jeff Davis <pgsql@j-davis.com> writes:
It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).That does seem like a good idea. We need one for an int-sized result
too, to fix the original problem in init_htab. So I propose these
functions:
Looks good to me. One other corner case in the version of the patch I
was working on was that nbuckets is compared to num_partitions, which is
a long. We can assert that it is less than or equal to INT_MAX in
hash_create.
Aside from that, I'll drop my version of the patch, which doesn't have
any useful differences from yours.
Regards,
Jeff Davis
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Davis <pgsql@j-davis.com> writes:
On Mon, 2012-12-10 at 20:27 -0500, Tom Lane wrote:
That does seem like a good idea. We need one for an int-sized result
too, to fix the original problem in init_htab. So I propose these
functions:
Looks good to me. One other corner case in the version of the patch I
was working on was that nbuckets is compared to num_partitions, which is
a long. We can assert that it is less than or equal to INT_MAX in
hash_create.
Aside from that, I'll drop my version of the patch, which doesn't have
any useful differences from yours.
I hadn't gone any further than to code and test the functions I listed.
If you are working on a complete patch, please press on.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, 2012-12-10 at 21:19 -0500, Tom Lane wrote:
I hadn't gone any further than to code and test the functions I listed.
If you are working on a complete patch, please press on.
Attached. This fixes my test case[1]The test case will just eat a lot of memory right now, but that's only because I set work_mem so high. So, it doesn't actually complete, but it no longer corrupts the HASHHDR. and uses the functions that you
wrote. I made them static because I couldn't think of a reason for
something outside to call them.
Regards,
Jeff Davis
[1]: The test case will just eat a lot of memory right now, but that's only because I set work_mem so high. So, it doesn't actually complete, but it no longer corrupts the HASHHDR.
only because I set work_mem so high. So, it doesn't actually complete,
but it no longer corrupts the HASHHDR.
Attachments:
htab.patch.gzapplication/x-gzip; name=htab.patch.gzDownload+52-23
Jeff Davis <pgsql@j-davis.com> writes:
On Mon, 2012-12-10 at 21:19 -0500, Tom Lane wrote:
I hadn't gone any further than to code and test the functions I listed.
If you are working on a complete patch, please press on.
Attached. This fixes my test case[1] and uses the functions that you
wrote. I made them static because I couldn't think of a reason for
something outside to call them.
Applied with minor adjustments.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs