glibc qsort() vulnerability

Started by Mats Kindahlabout 2 years ago61 messages
Jump to latest
#1Mats Kindahl
mats@timescale.com

Hi hackers,

There is a bug in glibc's qsort() algorithm that runs the risk of creating
an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

See
https://packetstormsecurity.com/files/176931/glibc-qsort-Out-Of-Bounds-Read-Write.html

I checked the existing functions in the latest version of Postgres source
code and most are safe, but there were a few ones that could lead to
overflow. I do not know if these can actually lead to problems, but better
safe than sorry, so I created a patch to fix those few cases and add a
comment to one case that was not clear that it could not overflow.

Best wishes,
Mats Kindahl, Timescale

Attachments:

0001-Ensure-comparison-functions-are-transitive.patchtext/x-patch; charset=US-ASCII; name=0001-Ensure-comparison-functions-are-transitive.patchDownload+21-6
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mats Kindahl (#1)
Re: glibc qsort() vulnerability

Mats Kindahl <mats@timescale.com> writes:

There is a bug in glibc's qsort() algorithm that runs the risk of creating
an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?

regards, tom lane

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#2)
Re: glibc qsort() vulnerability

On Tue, Feb 06, 2024 at 10:11:16AM -0500, Tom Lane wrote:

Mats Kindahl <mats@timescale.com> writes:

There is a bug in glibc's qsort() algorithm that runs the risk of creating
an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?

Oh, interesting. I didn't know that! As of commit 6edd2b4, we've used an
in-tree qsort() for everything. port.h has the following line:

#define qsort(a,b,c,d) pg_qsort(a,b,c,d)

I see a handful of callers that use pg_qsort() directly, which IMO is odd.
I would've expected that to be something different if I didn't know about
that macro. Maybe we should change those to qsort()...

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#3)
Re: glibc qsort() vulnerability

Nathan Bossart <nathandbossart@gmail.com> writes:

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

regards, tom lane

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#4)
Re: glibc qsort() vulnerability

On Tue, Feb 06, 2024 at 03:55:58PM -0500, Tom Lane wrote:

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

+1

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#6Mats Kindahl
mats@timescale.com
In reply to: Tom Lane (#2)
Re: glibc qsort() vulnerability

On Tue, Feb 6, 2024 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mats Kindahl <mats@timescale.com> writes:

There is a bug in glibc's qsort() algorithm that runs the risk of

creating

an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?

Interesting. No, haven't checked. Will do that.

Best wishes,
Mats Kindahl

Show quoted text

regards, tom lane

#7Mats Kindahl
mats@timescale.com
In reply to: Tom Lane (#4)
Re: glibc qsort() vulnerability

On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

The patch basically removes the risk of overflow in three routines and just
returns -1, 0, or 1, and adds a comment in one.

The routines modified do a subtraction of int:s and return that, which can
cause an overflow. This method is used for some int16 as well but since
standard conversions in C will perform the arithmetics in "int" precision,
this cannot overflow, so added a comment there. It might still be a good
idea to follow the same pattern for the int16 routines, but since there is
no bug there, I did not add them to the patch.

Best wishes,
Mats Kindahl

Show quoted text

regards, tom lane

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Mats Kindahl (#7)
Re: glibc qsort() vulnerability

On 07/02/2024 11:09, Mats Kindahl wrote:

On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

Nathan Bossart <nathandbossart@gmail.com
<mailto:nathandbossart@gmail.com>> writes:

Even if the glibc issue doesn't apply to Postgres, I'm tempted to

suggest

that we make it project policy that comparison functions must be
transitive.  There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be

easier to

reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

The patch basically removes the risk of overflow in three routines and
just returns -1, 0, or 1, and adds a comment in one.

The routines modified do a subtraction of int:s and return that, which
can cause an overflow. This method is used for some int16 as well but
since standard conversions in C will perform the arithmetics in "int"
precision, this cannot overflow, so added a comment there. It might
still be a good idea to follow the same pattern for the int16 routines,
but since there is no bug there, I did not add them to the patch.

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

However, we use our qsort() with user-defined comparison functions, and
we cannot make any guarantees about what they might do. So we must
ensure that our qsort() doesn't overflow, no matter what the comparison
function does.

Looking at our ST_SORT(), it seems safe to me.

--
Heikki Linnakangas
Neon (https://neon.tech)

#9Nathan Bossart
nathandbossart@gmail.com
In reply to: Heikki Linnakangas (#8)
Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

I attached a new version of the patch with some small adjustments. I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

However, we use our qsort() with user-defined comparison functions, and we
cannot make any guarantees about what they might do. So we must ensure that
our qsort() doesn't overflow, no matter what the comparison function does.

Looking at our ST_SORT(), it seems safe to me.

Cool.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-remove-direct-calls-to-pg_qsort.patchtext/x-diff; charset=us-asciiDownload+10-11
v2-0002-Ensure-comparison-functions-are-transitive.patchtext/x-diff; charset=us-asciiDownload+37-11
#10Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#8)
Re: glibc qsort() vulnerability

Hi,

On 2024-02-07 20:46:56 +0200, Heikki Linnakangas wrote:

The routines modified do a subtraction of int:s and return that, which
can cause an overflow. This method is used for some int16 as well but
since standard conversions in C will perform the arithmetics in "int"
precision, this cannot overflow, so added a comment there. It might
still be a good idea to follow the same pattern for the int16 routines,
but since there is no bug there, I did not add them to the patch.

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

It actually can hurt - the generated code will often be slower.

E.g.
#include <stdint.h>

int cmp_sub(int16_t a, int16_t b) {
return (int32_t) a - (int32_t) b;
}

int cmp_if(int16_t a, int16_t b) {
if (a < b)
return -1;
if (a > b)
return 1;
return 0;
}

yields

cmp_sub:
movsx eax, di
movsx esi, si
sub eax, esi
ret
cmp_if:
xor eax, eax
cmp di, si
mov edx, -1
setg al
cmovl eax, edx
ret

with gcc -O3. With other compilers, e.g. msvc, the difference is considerably
bigger, due to msvc for some reason not using cmov.

See https://godbolt.org/z/34qerPaPE for a few more details.

Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1]-- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort; this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
return (int64) *(const int32 *) a - (int64) *(const int32 *) b;

Perhaps we could have a central helper for this somewhere?

Greetings,

Andres Freund

[1]: -- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort;
-- prep
CREATE EXTENSION IF NOT EXISTS intarray;
DROP TABLE IF EXISTS arrays_to_sort;
CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10);
-- bench
SELECT (sort(arr))[1]-- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort; FROM arrays_to_sort;

#11Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#10)
Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote:

Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1] this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
return (int64) *(const int32 *) a - (int64) *(const int32 *) b;

Perhaps we could have a central helper for this somewhere?

Maybe said helper could use __builtin_sub_overflow() and fall back to the
slow "if" version only if absolutely necessary. The assembly for that
looks encouraging, but I still need to actually test it...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#12Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#11)
Re: glibc qsort() vulnerability

Hi,

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote:

Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1] this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
return (int64) *(const int32 *) a - (int64) *(const int32 *) b;

Perhaps we could have a central helper for this somewhere?

Maybe said helper could use __builtin_sub_overflow() and fall back to the
slow "if" version only if absolutely necessary.

I suspect that'll be worse code in the common case, given the cmov generated
by gcc & clang for the typical branch-y formulation. But it's worth testing.

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

Greetings,

Andres Freund

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#12)
Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code and a ~3.4% improvement with this:

int
compASC(const void *a, const void *b)
{
int result;

if (unlikely(pg_sub_s32_overflow(*(const int32 *) a,
*(const int32 *) b,
&result)))
{
if (*(const int32 *) a > *(const int32 *) b)
return 1;
if (*(const int32 *) a < *(const int32 *) b)
return -1;
return 0;
}

return result;
}

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#13)
Re: glibc qsort() vulnerability

Hi,

On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote:

On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code

Just to be clear, that code unfortuntely isn't correct, the return value is a
32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit
case.

and a ~3.4% improvement with this:

I guess that's still something.

Another branchless variant is (a > b) - (a < b). It seems to get a similar
improvement as the overflow-checking version.

Greetings,

Andres Freund

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#14)
Re: glibc qsort() vulnerability

On Thu, Feb 8, 2024 at 3:06 PM Andres Freund <andres@anarazel.de> wrote:

On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote:

On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code

Just to be clear, that code unfortuntely isn't correct, the return value is a
32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit
case.

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

#16Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#14)
Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 06:06:37PM -0800, Andres Freund wrote:

Another branchless variant is (a > b) - (a < b). It seems to get a similar
improvement as the overflow-checking version.

Well, that's certainly more elegant. I updated the patch to that approach
for now.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-remove-direct-calls-to-pg_qsort.patchtext/x-diff; charset=us-asciiDownload+10-11
v3-0002-Ensure-comparison-functions-are-transitive.patchtext/x-diff; charset=us-asciiDownload+17-10
#17Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#15)
Re: glibc qsort() vulnerability

On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

Ah, strike that, it is much like the suggested (a > b) - (a < b) but
with extra steps...

#18Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#17)
Re: glibc qsort() vulnerability

On Thu, Feb 08, 2024 at 03:49:03PM +1300, Thomas Munro wrote:

On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

Ah, strike that, it is much like the suggested (a > b) - (a < b) but
with extra steps...

Yeah, https://godbolt.org/ indicates that the sign approach compiles to

movsx rsi, esi
movsx rdi, edi
xor eax, eax
sub rdi, rsi
test rdi, rdi
setg al
shr rdi, 63
sub eax, edi
ret

while the approach Andres suggested compiles to

xor eax, eax
cmp edi, esi
setl dl
setg al
movzx edx, dl
sub eax, edx
ret

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#19Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#18)
Re: glibc qsort() vulnerability

Mats, I apologize for steamrolling a bit here. I'll take a step back into
a reviewer role.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#20Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#9)
Re: glibc qsort() vulnerability

On Wed, Feb 7, 2024 at 9:56 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

I attached a new version of the patch with some small adjustments. I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

Sure, I checked them and the only ones remaining are those using int16.
Shall I modify those as well?

Show quoted text

However, we use our qsort() with user-defined comparison functions, and

we

cannot make any guarantees about what they might do. So we must ensure

that

our qsort() doesn't overflow, no matter what the comparison function

does.

Looking at our ST_SORT(), it seems safe to me.

Cool.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#21Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#19)
#22Mats Kindahl
mats@timescale.com
In reply to: Mats Kindahl (#20)
#23Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#18)
#24Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#24)
#26Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#25)
#27Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#26)
#28Mats Kindahl
mats@timescale.com
In reply to: Tom Lane (#25)
#29Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#27)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#27)
#31Mats Kindahl
mats@timescale.com
In reply to: Tom Lane (#30)
#32Andrey Borodin
amborodin@acm.org
In reply to: Nathan Bossart (#13)
#33Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#31)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#33)
#35Nathan Bossart
nathandbossart@gmail.com
In reply to: Andrey Borodin (#32)
#36Andres Freund
andres@anarazel.de
In reply to: Andrey Borodin (#32)
#37Andrey Borodin
amborodin@acm.org
In reply to: Nathan Bossart (#35)
#38Andres Freund
andres@anarazel.de
In reply to: Andrey Borodin (#37)
#39Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#33)
#40Mats Kindahl
mats@timescale.com
In reply to: Tom Lane (#34)
#41Mats Kindahl
mats@timescale.com
In reply to: Tom Lane (#34)
#42Mats Kindahl
mats@timescale.com
In reply to: Mats Kindahl (#39)
#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#41)
#44Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#43)
#45Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#42)
#46Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#45)
#47Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#46)
#48Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#47)
#49Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#48)
#50Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#49)
#51Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#50)
#52Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Nathan Bossart (#51)
#53Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#51)
#54Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#53)
#55Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#54)
#56Mats Kindahl
mats@timescale.com
In reply to: Andres Freund (#55)
#57Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#56)
#58Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#57)
#59Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#58)
#60Nathan Bossart
nathandbossart@gmail.com
In reply to: Mats Kindahl (#59)
#61Mats Kindahl
mats@timescale.com
In reply to: Nathan Bossart (#60)