Avoid overflow with simplehash

Started by Ranier Vilelaover 2 years ago13 messages
#1Ranier Vilela
ranier.vf@gmail.com
1 attachment(s)

Hi,

SimpleHash.

The function SH_START_ITERATE can trigger some overflow.

See:
typedef struct SH_ITERATOR
{
uint32 cur; /* current element */
uint32 end;
bool done; /* iterator exhausted? */
} SH_ITERATOR;

The cur field is uint32 size and currently can be stored a uint64,
which obviously does not fit.

Also, the current index is int, which is possibly insufficient
since items can be up to uint32.

Attached a fix.

best regards,
Ranier Vilela

Attachments:

avoid-overflow-with-simplehash-start-iterate.patchapplication/octet-stream; name=avoid-overflow-with-simplehash-start-iterate.patchDownload
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 48db837ec8..d7711781de 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -964,8 +964,8 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 SH_SCOPE void
 SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 {
-	int			i;
-	uint64		startelem = PG_UINT64_MAX;
+	uint32		i;
+	uint32		startelem = PG_UINT32_MAX;
 
 	/*
 	 * Search for the first empty element. As deletions during iterations are
@@ -983,7 +983,7 @@ SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 		}
 	}
 
-	Assert(startelem < SH_MAX_SIZE);
+	Assert(startelem < PG_UINT32_MAX);
 
 	/*
 	 * Iterate backwards, that allows the current element to be deleted, even
#2Daniel Gustafsson
daniel@yesql.se
In reply to: Ranier Vilela (#1)
Re: Avoid overflow with simplehash

On 6 Jul 2023, at 16:28, Ranier Vilela <ranier.vf@gmail.com> wrote:

The function SH_START_ITERATE can trigger some overflow.

See:
typedef struct SH_ITERATOR
{
uint32 cur; /* current element */
uint32 end;
bool done; /* iterator exhausted? */
} SH_ITERATOR;

The cur field is uint32 size and currently can be stored a uint64,
which obviously does not fit.

-	Assert(startelem < SH_MAX_SIZE);
+	Assert(startelem < PG_UINT32_MAX);

I mighe be missing something, but from skimming the current code, SH_MAX_SIZE
is currently defined as:

#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)

Can you show a reproducer example where you are able to overflow?

--
Daniel Gustafsson

#3Ranier Vilela
ranier.vf@gmail.com
In reply to: Daniel Gustafsson (#2)
Re: Avoid overflow with simplehash

Em qui., 6 de jul. de 2023 às 11:37, Daniel Gustafsson <daniel@yesql.se>
escreveu:

On 6 Jul 2023, at 16:28, Ranier Vilela <ranier.vf@gmail.com> wrote:

The function SH_START_ITERATE can trigger some overflow.

See:
typedef struct SH_ITERATOR
{
uint32 cur; /* current element */
uint32 end;
bool done; /* iterator exhausted? */
} SH_ITERATOR;

The cur field is uint32 size and currently can be stored a uint64,
which obviously does not fit.

-       Assert(startelem < SH_MAX_SIZE);
+       Assert(startelem < PG_UINT32_MAX);

I mighe be missing something, but from skimming the current code,
SH_MAX_SIZE
is currently defined as:

#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)

This is Assert, that is, in production this test is not done.

See the comments:
"Search for the first empty element."

If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

Can you see this?

regards,
Ranier Vilela

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Ranier Vilela (#3)
Re: Avoid overflow with simplehash

On 6 Jul 2023, at 16:42, Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 6 de jul. de 2023 às 11:37, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> escreveu:

#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
This is Assert, that is, in production this test is not done.

Correct, which implies that it's a test for something which is deemed highly
unlikely to happen in production.

If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

Can you show an example where the hash isn't grown automatically to accomodate
this such that the assertion is tripped?

--
Daniel Gustafsson

#5Ranier Vilela
ranier.vf@gmail.com
In reply to: Daniel Gustafsson (#4)
Re: Avoid overflow with simplehash

Em qui., 6 de jul. de 2023 às 12:00, Daniel Gustafsson <daniel@yesql.se>
escreveu:

On 6 Jul 2023, at 16:42, Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 6 de jul. de 2023 às 11:37, Daniel Gustafsson <daniel@yesql.se

<mailto:daniel@yesql.se>> escreveu:

#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
This is Assert, that is, in production this test is not done.

Correct, which implies that it's a test for something which is deemed
highly
unlikely to happen in production.

Highly improbable does not mean impossible, or that it will never happen.

If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

Can you show an example where the hash isn't grown automatically to
accomodate
this such that the assertion is tripped?

A demo won't change the fact that the function can fail, even if it isn't
currently failing.
As a precaution to avoid future bugs, I think it's necessary to apply the
patch to increase the robustness of the function.

regards,
Ranier Vilela

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ranier Vilela (#3)
Re: Avoid overflow with simplehash

Ranier Vilela <ranier.vf@gmail.com> writes:

See the comments:
"Search for the first empty element."
If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

I think the point of that assertion is exactly that we're required to
have an empty element (because max fillfactor is less than 1),
so the search should have succeeded.

It does seem like we could do

uint64 startelem = SH_MAX_SIZE;

...

Assert(startelem < SH_MAX_SIZE);

which'd make it a little clearer that the expectation is for
startelem to have changed value. And I agree that declaring "i"
as int is wrong.

regards, tom lane

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: Avoid overflow with simplehash

Hi,

On 2023-07-06 11:16:26 -0400, Tom Lane wrote:

Ranier Vilela <ranier.vf@gmail.com> writes:

See the comments:
"Search for the first empty element."
If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

I think the point of that assertion is exactly that we're required to
have an empty element (because max fillfactor is less than 1),
so the search should have succeeded.

Right, that part of the proposed change seems bogus to me.

It does seem like we could do

uint64 startelem = SH_MAX_SIZE;

...

Assert(startelem < SH_MAX_SIZE);

which'd make it a little clearer that the expectation is for
startelem to have changed value.

I guess? I find it easier to understand all-bits-set in a coredump as
too-large than SH_MAX_SIZE, but ...

And I agree that declaring "i" as int is wrong.

Yea, that's definitely not right, not sure how I ended up with that. Will push
a fix. I guess it should be backpatched...

Greetings,

Andres Freund

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#7)
Re: Avoid overflow with simplehash

Andres Freund <andres@anarazel.de> writes:

On 2023-07-06 11:16:26 -0400, Tom Lane wrote:

It does seem like we could do
uint64 startelem = SH_MAX_SIZE;
...
Assert(startelem < SH_MAX_SIZE);
which'd make it a little clearer that the expectation is for
startelem to have changed value.

I guess? I find it easier to understand all-bits-set in a coredump as
too-large than SH_MAX_SIZE, but ...

What'd help even more is a comment:

/* We should have found an empty element */
Assert(startelem < SH_MAX_SIZE);

regards, tom lane

#9Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#6)
Re: Avoid overflow with simplehash

Em qui., 6 de jul. de 2023 às 12:16, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

Ranier Vilela <ranier.vf@gmail.com> writes:

See the comments:
"Search for the first empty element."
If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

Hi Tom,

I think the point of that assertion is exactly that we're required to
have an empty element (because max fillfactor is less than 1),
so the search should have succeeded.

It does seem like we could do

uint64 startelem = SH_MAX_SIZE;

...

Assert(startelem < SH_MAX_SIZE);

which'd make it a little clearer that the expectation is for
startelem to have changed value.

I still have doubts about this.

see:
#include <iostream>
#include <string>
#include <limits.h>

#define SH_MAX_SIZE1 (((unsigned long long) 0xFFFFFFFFU) + 1)
#define SH_MAX_SIZE2 (((unsigned long long) 0xFFFFFFFFU) - 1)

int main()
{
unsigned long long max_size1 = SH_MAX_SIZE1;
unsigned long long max_size2 = SH_MAX_SIZE2;
unsigned int cur1 = SH_MAX_SIZE1;
unsigned int cur2 = SH_MAX_SIZE2;

printf("SH_MAX_SIZE1=%llu\n", max_size1);
printf("SH_MAX_SIZE2=%llu\n", max_size2);
printf("cur1=%u\n", cur1);
printf("cur2=%u\n", cur2);
}
warning: implicit conversion from 'unsigned long long' to 'unsigned int'
changes value from 4294967296 to 0 [-Wconstant-conversion]

outputs:
SH_MAX_SIZE1=4294967296
SH_MAX_SIZE2=4294967294
cur1=0
cur2=4294967294

And in the comments we have:
"Iterate backwards, that allows the current element to be deleted, even
* if there are backward shifts"

So if an empty element is not found and the *cur* field is set to 0
(SH_MAX_SIZE -> uint32),
then will it iterate forwards?

And I agree that declaring "i"

as int is wrong.

Thanks for the confirmation.

regards,
Ranier Vilela

#10Andres Freund
andres@anarazel.de
In reply to: Ranier Vilela (#9)
Re: Avoid overflow with simplehash

Hi,

On 2023-07-06 13:40:00 -0300, Ranier Vilela wrote:

I still have doubts about this.

see:
#include <iostream>
#include <string>
#include <limits.h>

#define SH_MAX_SIZE1 (((unsigned long long) 0xFFFFFFFFU) + 1)
#define SH_MAX_SIZE2 (((unsigned long long) 0xFFFFFFFFU) - 1)

int main()
{
unsigned long long max_size1 = SH_MAX_SIZE1;
unsigned long long max_size2 = SH_MAX_SIZE2;
unsigned int cur1 = SH_MAX_SIZE1;
unsigned int cur2 = SH_MAX_SIZE2;

printf("SH_MAX_SIZE1=%llu\n", max_size1);
printf("SH_MAX_SIZE2=%llu\n", max_size2);
printf("cur1=%u\n", cur1);
printf("cur2=%u\n", cur2);
}
warning: implicit conversion from 'unsigned long long' to 'unsigned int'
changes value from 4294967296 to 0 [-Wconstant-conversion]

I don't think we at the moment try to not have implicit-conversion warnings
(nor do we enable them), this would be far from the only place. If we wanted
to here, we'd just need an explicit cast.

outputs:
SH_MAX_SIZE1=4294967296
SH_MAX_SIZE2=4294967294
cur1=0
cur2=4294967294

And in the comments we have:
"Iterate backwards, that allows the current element to be deleted, even
* if there are backward shifts"

So if an empty element is not found and the *cur* field is set to 0
(SH_MAX_SIZE -> uint32),

That should never be reachable - which the assert tries to ensure.

then will it iterate forwards?

No, it'd still iterate backwards, but starting from the wrong place - but
there is no correct place to start iterating from if there is no unused
element.

Greetings,

Andres Freund

#11Ranier Vilela
ranier.vf@gmail.com
In reply to: Andres Freund (#10)
1 attachment(s)
Re: Avoid overflow with simplehash

Em qui., 6 de jul. de 2023 às 13:52, Andres Freund <andres@anarazel.de>
escreveu:

Hi,

On 2023-07-06 13:40:00 -0300, Ranier Vilela wrote:

I still have doubts about this.

see:
#include <iostream>
#include <string>
#include <limits.h>

#define SH_MAX_SIZE1 (((unsigned long long) 0xFFFFFFFFU) + 1)
#define SH_MAX_SIZE2 (((unsigned long long) 0xFFFFFFFFU) - 1)

int main()
{
unsigned long long max_size1 = SH_MAX_SIZE1;
unsigned long long max_size2 = SH_MAX_SIZE2;
unsigned int cur1 = SH_MAX_SIZE1;
unsigned int cur2 = SH_MAX_SIZE2;

printf("SH_MAX_SIZE1=%llu\n", max_size1);
printf("SH_MAX_SIZE2=%llu\n", max_size2);
printf("cur1=%u\n", cur1);
printf("cur2=%u\n", cur2);
}
warning: implicit conversion from 'unsigned long long' to 'unsigned int'
changes value from 4294967296 to 0 [-Wconstant-conversion]

I don't think we at the moment try to not have implicit-conversion warnings
(nor do we enable them), this would be far from the only place. If we
wanted
to here, we'd just need an explicit cast.

It was just for show.

outputs:
SH_MAX_SIZE1=4294967296
SH_MAX_SIZE2=4294967294
cur1=0
cur2=4294967294

And in the comments we have:
"Iterate backwards, that allows the current element to be deleted, even
* if there are backward shifts"

So if an empty element is not found and the *cur* field is set to 0
(SH_MAX_SIZE -> uint32),

That should never be reachable - which the assert tries to ensure.

Right.

then will it iterate forwards?

No, it'd still iterate backwards, but starting from the wrong place - but
there is no correct place to start iterating from if there is no unused
element.

Thanks for the confirmation.

So I suppose we could have this in v1, attached.
With comments added by Tom.

regards,
Ranier Vilela

Attachments:

v1-avoid-overflow-with-simplehash-start-iterate.patchapplication/octet-stream; name=v1-avoid-overflow-with-simplehash-start-iterate.patchDownload
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 48db837ec8..4fe627a921 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -964,8 +964,8 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 SH_SCOPE void
 SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 {
-	int			i;
-	uint64		startelem = PG_UINT64_MAX;
+	uint32		i;
+	uint32		startelem = PG_UINT32_MAX;
 
 	/*
 	 * Search for the first empty element. As deletions during iterations are
@@ -983,7 +983,8 @@ SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 		}
 	}
 
-	Assert(startelem < SH_MAX_SIZE);
+	/* We should have found an empty element */
+	Assert(startelem < SH_SIZE_MAX);
 
 	/*
 	 * Iterate backwards, that allows the current element to be deleted, even
#12Andres Freund
andres@anarazel.de
In reply to: Ranier Vilela (#11)
Re: Avoid overflow with simplehash

Hi,

I pushed changing i to uint32 and adding Tom's comment to 11-HEAD.

On 2023-07-06 14:01:55 -0300, Ranier Vilela wrote:

then will it iterate forwards?

No, it'd still iterate backwards, but starting from the wrong place - but
there is no correct place to start iterating from if there is no unused
element.

Thanks for the confirmation.

So I suppose we could have this in v1, attached.
With comments added by Tom.

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 48db837ec8..4fe627a921 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -964,8 +964,8 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
SH_SCOPE void
SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
{
-	int			i;
-	uint64		startelem = PG_UINT64_MAX;
+	uint32		i;
+	uint32		startelem = PG_UINT32_MAX;

The startelem type change doesn't strike me as a good idea. Currently
PG_UINT32_MAX is a valid element.

Greetings,

Andres Freund

#13Ranier Vilela
ranier.vf@gmail.com
In reply to: Andres Freund (#12)
Re: Avoid overflow with simplehash

Em qui., 6 de jul. de 2023 às 14:33, Andres Freund <andres@anarazel.de>
escreveu:

Hi,

I pushed changing i to uint32 and adding Tom's comment to 11-HEAD.

Thank you.

regards,
Ranier Vilela