Reliance on undefined behaviour in << operator

Started by Craig Ringerover 10 years ago7 messages
#1Craig Ringer
craig@2ndquadrant.com

Hi all

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.

Datum
int8shl(PG_FUNCTION_ARGS)
{
int64 arg1 = PG_GETARG_INT64(0);
int32 arg2 = PG_GETARG_INT32(1);

PG_RETURN_INT64(arg1 << arg2);
}

This means that an operation like:

1::bigint << 65

directly relies on the compiler and platforms' handling of the
undefined shift. On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

Pg returns:

test=> SELECT BIGINT '1' << 66;
?column?
----------
4
(1 row)

A test program:

#include "stdio.h"
int main(int argc, char * argv[])
{
printf("Result is %ld", 1l << 66);
return 0;
}

returns zero when the compiler constant-folds, but when done at runtime:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char * argv[])
{
const char * num = "66";
printf("Result is %ld", 1l << atoi(num));
return 0;
}

IMO we should specify the behaviour in this case. Then issue a WARNING
that gets promoted to an ERROR in a few versions.

Consideration of << with a negative right-operand, and of
out-of-bounds >>, is probably also needed.

Thoughts?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Craig Ringer (#1)
Re: Reliance on undefined behaviour in << operator

Hello, I don't think so many people have used shift operators
with too-large or negative shift amount relying on the
undetermined behavior.

But if explicit definition is required, I prefer the result of a
shift operation with too-large shift mount is simplly zero. And
shift left with negative shift amount should do right
shift. Addition to that, no error nor warning won't be needed.

Like this,

Datum
int8shl(PG_FUNCTION_ARGS)
{
int64 arg1 = PG_GETARG_INT64(0);
int32 arg2 = PG_GETARG_INT32(1);

if (arg2 > 63 || arg2 < -63) PG_RETURN_INT64(0L);
if (arg2 < 0)
PG_RETURN_INT64(arg1 >> (-arg2));

PG_RETURN_INT64(arg1 << arg2);
}

The obvious problem on this is the lack of compatibility with
existing behavior:(

Thoughts? Opinions?

regards,

At Wed, 16 Sep 2015 15:16:27 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YE+0KJuOJfbB2nLVfU+14R50Yi90e_8DewLV9jX+ro1zg@mail.gmail.com>

Hi all

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.

Datum
int8shl(PG_FUNCTION_ARGS)
{
int64 arg1 = PG_GETARG_INT64(0);
int32 arg2 = PG_GETARG_INT32(1);

PG_RETURN_INT64(arg1 << arg2);
}

This means that an operation like:

1::bigint << 65

directly relies on the compiler and platforms' handling of the
undefined shift. On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

Pg returns:

test=> SELECT BIGINT '1' << 66;
?column?
----------
4
(1 row)

A test program:

#include "stdio.h"
int main(int argc, char * argv[])
{
printf("Result is %ld", 1l << 66);
return 0;
}

returns zero when the compiler constant-folds, but when done at runtime:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char * argv[])
{
const char * num = "66";
printf("Result is %ld", 1l << atoi(num));
return 0;
}

IMO we should specify the behaviour in this case. Then issue a WARNING
that gets promoted to an ERROR in a few versions.

Consideration of << with a negative right-operand, and of
out-of-bounds >>, is probably also needed.

Thoughts?

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Craig Ringer (#1)
Re: Reliance on undefined behaviour in << operator

On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.

Datum
int8shl(PG_FUNCTION_ARGS)
{
int64 arg1 = PG_GETARG_INT64(0);
int32 arg2 = PG_GETARG_INT32(1);

PG_RETURN_INT64(arg1 << arg2);
}

This means that an operation like:

1::bigint << 65

directly relies on the compiler and platforms' handling of the
undefined shift. On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

I agree.

Pg returns:

test=> SELECT BIGINT '1' << 66;
?column?
----------
4
(1 row)

A test program:

#include "stdio.h"
int main(int argc, char * argv[])
{
printf("Result is %ld", 1l << 66);
return 0;
}

returns zero when the compiler constant-folds, but when done at runtime:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char * argv[])
{
const char * num = "66";
printf("Result is %ld", 1l << atoi(num));
return 0;
}

IMO we should specify the behaviour in this case. Then issue a WARNING
that gets promoted to an ERROR in a few versions.

I disagree. Such warnings are prone to be really annoying, e.g. by
generating massive log spam. I'd say that we should either make a
large shift return 0 - which seems like the intuitively right behavior
to me: if you shift in N zeros where N is greater than the the word
size, then you should end up with all zeroes - or throw an error right
away. I'd vote for the former.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: Reliance on undefined behaviour in << operator

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.
... On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

I agree.

As far as I'm concerned, what those operators mean is "whatever your
compiler makes them mean". This is hardly the only place where we expose
platform-dependent behavior --- see also locale dependencies, timezones,
floating point, yadda yadda --- and I do not find it the most compelling
place to start reversing that general approach.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: Reliance on undefined behaviour in << operator

On Wed, Sep 16, 2015 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.
... On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

I agree.

As far as I'm concerned, what those operators mean is "whatever your
compiler makes them mean". This is hardly the only place where we expose
platform-dependent behavior --- see also locale dependencies, timezones,
floating point, yadda yadda --- and I do not find it the most compelling
place to start reversing that general approach.

No, I don't agree, in this case. You could say that int32 + int32 is
platform-dependent behavior too, when it overflows. But we don't say
that. This case seems more like an overflow condition than it does
like platform-dependent behavior that we should just pass through.

Also, I think it's worth noting that, AFAICT, our users %@#! HATE the
places where we expose platform-dependent behavior. That's why people
keep proposing ICU for collations, and cursing their fate when Red Hat
rolls out a glibc fix that changes some collation's ordering thus
leaving all their indexes "corrupted".

Of course, we're not in a position to eliminate all of those platform
dependencies, because it would require us to integrate with - or
develop - platform-dependent libraries for all of those things. And
we don't really have the bandwidth for that, or at least it's not
likely the best use of our time. But all things being equal, I don't
think the fact that we have platform dependencies that are hard to
eliminate means we should keep around the ones that are easy to
eliminate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: Reliance on undefined behaviour in << operator

On 2015-09-16 15:57:04 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.
... On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

I agree.

As far as I'm concerned, what those operators mean is "whatever your
compiler makes them mean".

According to C that's undefined behaviour. So in the extreme sense that
could mean that the instruction could trigger a SIGBUS or something.

This is hardly the only place where we expose
platform-dependent behavior --- see also locale dependencies, timezones,
floating point, yadda yadda --- and I do not find it the most compelling
place to start reversing that general approach.

But in other places We do overflow checks, so I don't think that'd be
reversal of a general approach.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7ktm@rice.edu
ktm@rice.edu
In reply to: Tom Lane (#4)
Re: Reliance on undefined behaviour in << operator

On Wed, Sep 16, 2015 at 03:57:04PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.
... On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.

I agree.

As far as I'm concerned, what those operators mean is "whatever your
compiler makes them mean". This is hardly the only place where we expose
platform-dependent behavior --- see also locale dependencies, timezones,
floating point, yadda yadda --- and I do not find it the most compelling
place to start reversing that general approach.

regards, tom lane

+1

I tend to agree. Unless the behavior is mandated by the SQL standard, I have
always expected the behavior of those apps to follow that defined by the
compiler.

Regards,
Ken

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers