init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

Started by Jeff Davisover 13 years ago9 messagesbugs
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

If nbuckets is around 2^30+1, which can happen if work_mem is very high,
then:

nbuckets = 1 << my_log2(lnbuckets);

ends up as 1 << 31, which is negative, leading to a SIGFPE on my
machine, but I think it can also lead to an infinite loop or a crash
(after corrupting the HASHHDR).

The only simple way I can reproduce this is with gdb:

1. attach gdb to a session
2. set a breakpoint in ExecInitRecursiveUnion and continue
3. execute in session:
set work_mem='100GB';
with recursive r (i) as
(select 1 union select i+1 from r where i < 10)
select * from r;
4. (gdb) set node->numGroups = (1 << 30) + 1
5. (gdb) continue

I think we should just cap nbuckets at 1 << 30 in init_htab.

There was a previous fix here:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=299d1716525c659f0e02840e31fbe4dea3

But it assumed that init_htab could deal with INT_MAX. In practice,
work_mem will usually be the limiting factor anyway, but not if it's set
high.

Regards,
Jeff Davis

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#1)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

Jeff Davis <pgsql@j-davis.com> writes:

If nbuckets is around 2^30+1, which can happen if work_mem is very high,
then:
nbuckets = 1 << my_log2(lnbuckets);
ends up as 1 << 31, which is negative, leading to a SIGFPE on my
machine, but I think it can also lead to an infinite loop or a crash
(after corrupting the HASHHDR).
...
I think we should just cap nbuckets at 1 << 30 in init_htab.

Yeah. After looking at other uses of my_log2, it seems like
hash_estimate_size and hash_select_dirsize probably should also
bound their inputs to avoid overflow of 1 << my_log2() calculations.

Alternatively, maybe we should hack my_log2 to do that bounding.
As-is, it seems like trouble waiting to happen. This won't protect
init_htab, which needs the shift result to fit in int not long.
But my_log2 is just plain broken for inputs larger than LONG_MAX/2,
anyway.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#2)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

On Sun, 2012-12-09 at 17:45 -0500, Tom Lane wrote:

Yeah. After looking at other uses of my_log2, it seems like
hash_estimate_size and hash_select_dirsize probably should also
bound their inputs to avoid overflow of 1 << my_log2() calculations.

Alternatively, maybe we should hack my_log2 to do that bounding.
As-is, it seems like trouble waiting to happen. This won't protect
init_htab, which needs the shift result to fit in int not long.
But my_log2 is just plain broken for inputs larger than LONG_MAX/2,
anyway.

It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).
Then, the caller can bound the result further if needed, which should be
less error-prone, because the caller would see that it returns a long
(and compiler, but I don't seem to get a warning for implicit long->int
conversions).

Both of the callers that actually want a log2 already assume that the
input is a power of two, so we can redefine my_log2 to require it.

Regards,
Jeff Davis

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#3)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

Jeff Davis <pgsql@j-davis.com> writes:

On Sun, 2012-12-09 at 17:45 -0500, Tom Lane wrote:

Alternatively, maybe we should hack my_log2 to do that bounding.
As-is, it seems like trouble waiting to happen. This won't protect
init_htab, which needs the shift result to fit in int not long.
But my_log2 is just plain broken for inputs larger than LONG_MAX/2,
anyway.

It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).
Then, the caller can bound the result further if needed, which should be
less error-prone, because the caller would see that it returns a long
(and compiler, but I don't seem to get a warning for implicit long->int
conversions).

Both of the callers that actually want a log2 already assume that the
input is a power of two, so we can redefine my_log2 to require it.

It seems reasonably possible that add-on code is using my_log2, so I'm
hesitant to change the function's signature, or to change its behavior
more than minimally necessary --- especially in the back branches.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#3)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

Jeff Davis <pgsql@j-davis.com> writes:

It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).

That does seem like a good idea. We need one for an int-sized result
too, to fix the original problem in init_htab. So I propose these
functions:

/* calculate ceil(log base 2) of num */
int
my_log2(long num)
{
int i;
long limit;

/* guard against too-large input, which would put us into infinite loop */
if (num > LONG_MAX / 2)
num = LONG_MAX / 2;

for (i = 0, limit = 1; limit < num; i++, limit <<= 1)
;
return i;
}

/* calculate first power of 2 >= num, bounded to what will fit in a long */
long
next_power_of_two_long(long num)
{
/* my_log2's internal range check is sufficient */
return 1L << my_log2(num);
}

/* calculate first power of 2 >= num, bounded to what will fit in an int */
int
next_power_of_two_int(long num)
{
if (num > INT_MAX / 2)
num = INT_MAX / 2;
return 1 << my_log2(num);
}

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#5)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

On Mon, 2012-12-10 at 20:27 -0500, Tom Lane wrote:

Jeff Davis <pgsql@j-davis.com> writes:

It looks like all of the callers, except two, immediately shift the
result. So perhaps it would be better to make a new function (something
like "ceil_pow2") that returns the lowest power of two greater than or
equal to the input, and it can return a long (bounded to +LONG_MAX).

That does seem like a good idea. We need one for an int-sized result
too, to fix the original problem in init_htab. So I propose these
functions:

Looks good to me. One other corner case in the version of the patch I
was working on was that nbuckets is compared to num_partitions, which is
a long. We can assert that it is less than or equal to INT_MAX in
hash_create.

Aside from that, I'll drop my version of the patch, which doesn't have
any useful differences from yours.

Regards,
Jeff Davis

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#6)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

Jeff Davis <pgsql@j-davis.com> writes:

On Mon, 2012-12-10 at 20:27 -0500, Tom Lane wrote:

That does seem like a good idea. We need one for an int-sized result
too, to fix the original problem in init_htab. So I propose these
functions:

Looks good to me. One other corner case in the version of the patch I
was working on was that nbuckets is compared to num_partitions, which is
a long. We can assert that it is less than or equal to INT_MAX in
hash_create.

Aside from that, I'll drop my version of the patch, which doesn't have
any useful differences from yours.

I hadn't gone any further than to code and test the functions I listed.
If you are working on a complete patch, please press on.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#7)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

On Mon, 2012-12-10 at 21:19 -0500, Tom Lane wrote:

I hadn't gone any further than to code and test the functions I listed.
If you are working on a complete patch, please press on.

Attached. This fixes my test case[1]The test case will just eat a lot of memory right now, but that's only because I set work_mem so high. So, it doesn't actually complete, but it no longer corrupts the HASHHDR. and uses the functions that you
wrote. I made them static because I couldn't think of a reason for
something outside to call them.

Regards,
Jeff Davis

[1]: The test case will just eat a lot of memory right now, but that's only because I set work_mem so high. So, it doesn't actually complete, but it no longer corrupts the HASHHDR.
only because I set work_mem so high. So, it doesn't actually complete,
but it no longer corrupts the HASHHDR.

Attachments:

htab.patch.gzapplication/x-gzip; name=htab.patch.gzDownload+52-23
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#8)
Re: init_htab causes SIGFPE (or worse) due to miscalculation for large nbuckets

Jeff Davis <pgsql@j-davis.com> writes:

On Mon, 2012-12-10 at 21:19 -0500, Tom Lane wrote:

I hadn't gone any further than to code and test the functions I listed.
If you are working on a complete patch, please press on.

Attached. This fixes my test case[1] and uses the functions that you
wrote. I made them static because I couldn't think of a reason for
something outside to call them.

Applied with minor adjustments.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs