glibc qsort() vulnerability

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Mats Kindahl (#1)

Re: glibc qsort() vulnerability

Mats Kindahl <mats@timescale.com> writes:

There is a bug in glibc's qsort() algorithm that runs the risk of creating
an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?

regards, tom lane

nathandbossart@gmail.com

about 2 years ago

In reply to: Tom Lane (#2)

Re: glibc qsort() vulnerability

On Tue, Feb 06, 2024 at 10:11:16AM -0500, Tom Lane wrote:

Mats Kindahl <mats@timescale.com> writes:

There is a bug in glibc's qsort() algorithm that runs the risk of creating
an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?

Oh, interesting. I didn't know that! As of commit 6edd2b4, we've used an
in-tree qsort() for everything. port.h has the following line:

#define qsort(a,b,c,d) pg_qsort(a,b,c,d)

I see a handful of callers that use pg_qsort() directly, which IMO is odd.
I would've expected that to be something different if I didn't know about
that macro. Maybe we should change those to qsort()...

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Nathan Bossart (#3)

Re: glibc qsort() vulnerability

Nathan Bossart <nathandbossart@gmail.com> writes:

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

regards, tom lane

nathandbossart@gmail.com

about 2 years ago

In reply to: Tom Lane (#4)

Re: glibc qsort() vulnerability

On Tue, Feb 06, 2024 at 03:55:58PM -0500, Tom Lane wrote:

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

mats@timescale.com

about 2 years ago

In reply to: Tom Lane (#2)

Re: glibc qsort() vulnerability

On Tue, Feb 6, 2024 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mats Kindahl <mats@timescale.com> writes:

There is a bug in glibc's qsort() algorithm that runs the risk of

creating

an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?

Interesting. No, haven't checked. Will do that.

Best wishes,
Mats Kindahl

Show quoted text

regards, tom lane

mats@timescale.com

about 2 years ago

In reply to: Tom Lane (#4)

Re: glibc qsort() vulnerability

On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

The patch basically removes the risk of overflow in three routines and just
returns -1, 0, or 1, and adds a comment in one.

The routines modified do a subtraction of int:s and return that, which can
cause an overflow. This method is used for some int16 as well but since
standard conversions in C will perform the arithmetics in "int" precision,
this cannot overflow, so added a comment there. It might still be a good
idea to follow the same pattern for the int16 routines, but since there is
no bug there, I did not add them to the patch.

Best wishes,
Mats Kindahl

Show quoted text

regards, tom lane

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 2 years ago

In reply to: Mats Kindahl (#7)

Re: glibc qsort() vulnerability

On 07/02/2024 11:09, Mats Kindahl wrote:

On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

Nathan Bossart <nathandbossart@gmail.com
<mailto:nathandbossart@gmail.com>> writes:

Even if the glibc issue doesn't apply to Postgres, I'm tempted to

suggest

that we make it project policy that comparison functions must be
transitive. There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be

easier to

reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

The patch basically removes the risk of overflow in three routines and
just returns -1, 0, or 1, and adds a comment in one.

The routines modified do a subtraction of int:s and return that, which
can cause an overflow. This method is used for some int16 as well but
since standard conversions in C will perform the arithmetics in "int"
precision, this cannot overflow, so added a comment there. It might
still be a good idea to follow the same pattern for the int16 routines,
but since there is no bug there, I did not add them to the patch.

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

However, we use our qsort() with user-defined comparison functions, and
we cannot make any guarantees about what they might do. So we must
ensure that our qsort() doesn't overflow, no matter what the comparison
function does.

Looking at our ST_SORT(), it seems safe to me.

--
Heikki Linnakangas
Neon (https://neon.tech)

nathandbossart@gmail.com

about 2 years ago

In reply to: Heikki Linnakangas (#8)

Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

I attached a new version of the patch with some small adjustments. I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

However, we use our qsort() with user-defined comparison functions, and we
cannot make any guarantees about what they might do. So we must ensure that
our qsort() doesn't overflow, no matter what the comparison function does.

Looking at our ST_SORT(), it seems safe to me.

Cool.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#10

andres@anarazel.de

about 2 years ago

In reply to: Heikki Linnakangas (#8)

Re: glibc qsort() vulnerability

Hi,

On 2024-02-07 20:46:56 +0200, Heikki Linnakangas wrote:

The routines modified do a subtraction of int:s and return that, which
can cause an overflow. This method is used for some int16 as well but
since standard conversions in C will perform the arithmetics in "int"
precision, this cannot overflow, so added a comment there. It might
still be a good idea to follow the same pattern for the int16 routines,
but since there is no bug there, I did not add them to the patch.

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

It actually can hurt - the generated code will often be slower.

E.g.
#include <stdint.h>

int cmp_sub(int16_t a, int16_t b) {
return (int32_t) a - (int32_t) b;
}

int cmp_if(int16_t a, int16_t b) {
if (a < b)
return -1;
if (a > b)
return 1;
return 0;
}

yields

cmp_sub:
movsx eax, di
movsx esi, si
sub eax, esi
ret
cmp_if:
xor eax, eax
cmp di, si
mov edx, -1
setg al
cmovl eax, edx
ret

with gcc -O3. With other compilers, e.g. msvc, the difference is considerably
bigger, due to msvc for some reason not using cmov.

See https://godbolt.org/z/34qerPaPE for a few more details.

Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1]-- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort; this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
return (int64) *(const int32 *) a - (int64) *(const int32 *) b;

Perhaps we could have a central helper for this somewhere?

Greetings,

Andres Freund

[1]: -- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort;
-- prep
CREATE EXTENSION IF NOT EXISTS intarray;
DROP TABLE IF EXISTS arrays_to_sort;
CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10);
-- bench
SELECT (sort(arr))[1]-- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1, 10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort; FROM arrays_to_sort;

#11

nathandbossart@gmail.com

about 2 years ago

In reply to: Andres Freund (#10)

Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote:

Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1] this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
return (int64) *(const int32 *) a - (int64) *(const int32 *) b;

Perhaps we could have a central helper for this somewhere?

Maybe said helper could use __builtin_sub_overflow() and fall back to the
slow "if" version only if absolutely necessary. The assembly for that
looks encouraging, but I still need to actually test it...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#12

andres@anarazel.de

about 2 years ago

In reply to: Nathan Bossart (#11)

Re: glibc qsort() vulnerability

Hi,

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote:

Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1] this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
return (int64) *(const int32 *) a - (int64) *(const int32 *) b;

Perhaps we could have a central helper for this somewhere?

Maybe said helper could use __builtin_sub_overflow() and fall back to the
slow "if" version only if absolutely necessary.

I suspect that'll be worse code in the common case, given the cmov generated
by gcc & clang for the typical branch-y formulation. But it's worth testing.

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

Greetings,

Andres Freund

#13

nathandbossart@gmail.com

about 2 years ago

In reply to: Andres Freund (#12)

Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code and a ~3.4% improvement with this:

int
compASC(const void *a, const void *b)
{
int result;

if (unlikely(pg_sub_s32_overflow(*(const int32 *) a,
*(const int32 *) b,
&result)))
{
if (*(const int32 *) a > *(const int32 *) b)
return 1;
if (*(const int32 *) a < *(const int32 *) b)
return -1;
return 0;
}

return result;
}

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

andres@anarazel.de

about 2 years ago

In reply to: Nathan Bossart (#13)

Re: glibc qsort() vulnerability

Hi,

On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote:

On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code

Just to be clear, that code unfortuntely isn't correct, the return value is a
32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit
case.

and a ~3.4% improvement with this:

I guess that's still something.

Another branchless variant is (a > b) - (a < b). It seems to get a similar
improvement as the overflow-checking version.

Greetings,

Andres Freund

#15

Thomas Munro

thomas.munro@gmail.com

about 2 years ago

In reply to: Andres Freund (#14)

Re: glibc qsort() vulnerability

On Thu, Feb 8, 2024 at 3:06 PM Andres Freund <andres@anarazel.de> wrote:

On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote:

On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:

The assembly for that looks encouraging, but I still need to actually test
it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code

Just to be clear, that code unfortuntely isn't correct, the return value is a
32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit
case.

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

#16

nathandbossart@gmail.com

about 2 years ago

In reply to: Andres Freund (#14)

Re: glibc qsort() vulnerability

On Wed, Feb 07, 2024 at 06:06:37PM -0800, Andres Freund wrote:

Another branchless variant is (a > b) - (a < b). It seems to get a similar
improvement as the overflow-checking version.

Well, that's certainly more elegant. I updated the patch to that approach
for now.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#17

Thomas Munro

thomas.munro@gmail.com

about 2 years ago

In reply to: Thomas Munro (#15)

Re: glibc qsort() vulnerability

On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

Ah, strike that, it is much like the suggested (a > b) - (a < b) but
with extra steps...

#18

nathandbossart@gmail.com

about 2 years ago

In reply to: Thomas Munro (#17)

Re: glibc qsort() vulnerability

On Thu, Feb 08, 2024 at 03:49:03PM +1300, Thomas Munro wrote:

On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

Ah, strike that, it is much like the suggested (a > b) - (a < b) but
with extra steps...

Yeah, https://godbolt.org/ indicates that the sign approach compiles to

movsx rsi, esi
movsx rdi, edi
xor eax, eax
sub rdi, rsi
test rdi, rdi
setg al
shr rdi, 63
sub eax, edi
ret

while the approach Andres suggested compiles to

xor eax, eax
cmp edi, esi
setl dl
setg al
movzx edx, dl
sub eax, edx
ret

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#19

nathandbossart@gmail.com

about 2 years ago

In reply to: Nathan Bossart (#18)

Re: glibc qsort() vulnerability

Mats, I apologize for steamrolling a bit here. I'll take a step back into
a reviewer role.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#20

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#9)

Re: glibc qsort() vulnerability

On Wed, Feb 7, 2024 at 9:56 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:

Doesn't hurt to fix the comparison functions, and +1 on using the same
pattern everywhere.

I attached a new version of the patch with some small adjustments. I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

Sure, I checked them and the only ones remaining are those using int16.
Shall I modify those as well?

Show quoted text

However, we use our qsort() with user-defined comparison functions, and

we

cannot make any guarantees about what they might do. So we must ensure

that

our qsort() doesn't overflow, no matter what the comparison function

does.

Looking at our ST_SORT(), it seems safe to me.

Cool.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#21

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#19)

#22

mats@timescale.com

about 2 years ago

In reply to: Mats Kindahl (#20)

#23

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#18)

#24

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#23)

#25

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Nathan Bossart (#24)

#26

andres@anarazel.de

about 2 years ago

In reply to: Tom Lane (#25)

#27

nathandbossart@gmail.com

about 2 years ago

In reply to: Andres Freund (#26)

#28

mats@timescale.com

about 2 years ago

In reply to: Tom Lane (#25)

#29

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#27)

#30

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Nathan Bossart (#27)

#31

mats@timescale.com

about 2 years ago

In reply to: Tom Lane (#30)

#32

Andrey Borodin

amborodin@acm.org

about 2 years ago

In reply to: Nathan Bossart (#13)

#33

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#31)

#34

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Nathan Bossart (#33)

#35

nathandbossart@gmail.com

about 2 years ago

In reply to: Andrey Borodin (#32)

#36

andres@anarazel.de

about 2 years ago

In reply to: Andrey Borodin (#32)

#37

Andrey Borodin

amborodin@acm.org

about 2 years ago

In reply to: Nathan Bossart (#35)

#38

andres@anarazel.de

about 2 years ago

In reply to: Andrey Borodin (#37)

#39

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#33)

#40

mats@timescale.com

about 2 years ago

In reply to: Tom Lane (#34)

#41

mats@timescale.com

about 2 years ago

In reply to: Tom Lane (#34)

#42

mats@timescale.com

about 2 years ago

In reply to: Mats Kindahl (#39)

#43

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#41)

#44

andres@anarazel.de

about 2 years ago

In reply to: Nathan Bossart (#43)

#45

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#42)

#46

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#45)

#47

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#46)

#48

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#47)

#49

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#48)

#50

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#49)

#51

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#50)

#52

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 2 years ago

In reply to: Nathan Bossart (#51)

#53

andres@anarazel.de

about 2 years ago

In reply to: Nathan Bossart (#51)

#54

nathandbossart@gmail.com

about 2 years ago

In reply to: Andres Freund (#53)

#55

andres@anarazel.de

about 2 years ago

In reply to: Nathan Bossart (#54)

#56

mats@timescale.com

about 2 years ago

In reply to: Andres Freund (#55)

#57

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#56)

#58

nathandbossart@gmail.com

about 2 years ago

In reply to: Nathan Bossart (#57)

#59

mats@timescale.com

about 2 years ago

In reply to: Nathan Bossart (#58)

#60

nathandbossart@gmail.com

about 2 years ago

In reply to: Mats Kindahl (#59)

#61