to_hex() for negative inputs

Started by Dean Rasheedalmost 3 years ago10 messages

dean.a.rasheed@gmail.com

almost 3 years ago

I only recently realised that to_hex() converts its input to unsigned
before converting it to hex (something that's not mentioned in the
docs):

to_hex(-1) -> ffffffff

I think that's something that some users might find surprising,
especially if they were expecting to be able to use it to output
values that could be read back in, now that we support non-decimal
integer input.

So I think the docs should be a little more explicit about this. I
think the following should suffice:

---
to_hex ( integer ) -> text
to_hex ( bigint ) -> text

Converts the number to its equivalent two's complement hexadecimal
representation.

to_hex(1234) -> 4d2
to_hex(-1234) -> fffffb2e
---

instead of the existing example with 2147483647, which doesn't add much.

I also think it might be useful for it to gain a couple of boolean options:

1). An option to output a signed value (defaulting to false, to
preserve the current two's complement output).

2). An option to output the base prefix "0x" (which comes after the
sign, making it inconvenient for the user to add themselves).

I've also been idly wondering about whether we should have a numeric
variant (for integers only, since non-integers won't necessarily
terminate in hex, aren't accepted as inputs, and don't seem
particularly useful anyway). I don't think two's complement output
makes much sense for numeric, so perhaps it should only have the
prefix option, and always output signed hex strings.

Thoughts?

Regards,
Dean

Aleksander Alekseev

aleksander@timescale.com

almost 3 years ago

In reply to: Dean Rasheed (#1)

Re: to_hex() for negative inputs

Hi Dean,

I only recently realised that to_hex() converts its input to unsigned
before converting it to hex (something that's not mentioned in the
docs):

Technically the documentation is accurate [1]https://www.postgresql.org/docs/current/functions-string.html:

"""
Converts the number to its equivalent hexadecimal representation.
"""

But I agree that adding an example with negative numbers will not
hurt. Would you like to submit a patch?

I also think it might be useful for it to gain a couple of boolean options:

Adding extra arguments for something the user can implement
(him/her)self doesn't seem to be a great idea. With this approach we
may end up with hundreds of arguments one day.

1). An option to output a signed value (defaulting to false, to
preserve the current two's complement output).

This in particular can be done like this:

```
=# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (123), (-123) ) as s(x);
?column?
----------
7b
-7b
(2 rows)
```

2). An option to output the base prefix "0x" (which comes after the
sign, making it inconvenient for the user to add themselves).

Ditto:

```
=# select '0x' || to_hex(x) from ( values (123), (-123) ) as s(x);
?column?
------------
0x7b
0xffffff85
```

[1]: https://www.postgresql.org/docs/current/functions-string.html

--
Best regards,
Aleksander Alekseev

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Aleksander Alekseev (#2)

Re: to_hex() for negative inputs

On Tue, 24 Jan 2023 at 13:43, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Adding extra arguments for something the user can implement
(him/her)self doesn't seem to be a great idea. With this approach we
may end up with hundreds of arguments one day.

I don't see how a couple of extra arguments will expand to hundreds.

=# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (123), (-123) ) as s(x);

Which is broken for INT_MIN:

select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (-2147483648) ) as s(x);

ERROR: integer out of range

Part of the reason for implementing this in core would be to save
users from such easy-to-overlook bugs.

Regards,
Dean

Aleksander Alekseev

aleksander@timescale.com

almost 3 years ago

In reply to: Dean Rasheed (#3)

Re: to_hex() for negative inputs

Hi Dean,

I don't see how a couple of extra arguments will expand to hundreds.

Maybe I was exaggerating, but the point is that adding extra flags for
every possible scenario is a disadvantageous approach in general.
There is no need to increase the code base, the amount of test cases
and the amount of documentation every time someone has an idea "in
rare cases I also may want to do A or B, let's add a flag for this".

Which is broken for INT_MIN:

select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (-2147483648) ) as s(x);

ERROR: integer out of range

I'm afraid the behavior of something like to_hex(X, with_sign => true)
is going to be exactly the same. There is no safe and consistent way
to calculate abs(INT_MIN).

--
Best regards,
Aleksander Alekseev

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Aleksander Alekseev (#4)

Re: to_hex() for negative inputs

On Wed, 25 Jan 2023 at 09:02, Aleksander Alekseev
<aleksander@timescale.com> wrote:

I don't see how a couple of extra arguments will expand to hundreds.

Maybe I was exaggerating, but the point is that adding extra flags for
every possible scenario is a disadvantageous approach in general.
There is no need to increase the code base, the amount of test cases
and the amount of documentation every time someone has an idea "in
rare cases I also may want to do A or B, let's add a flag for this".

OK, but the point was that we've just added support for accepting hex
inputs in a particular format, so I think it would be useful if
to_hex() could produce outputs compatible with that.

Which is broken for INT_MIN:

select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
from ( values (-2147483648) ) as s(x);

ERROR: integer out of range

I'm afraid the behavior of something like to_hex(X, with_sign => true)
is going to be exactly the same. There is no safe and consistent way
to calculate abs(INT_MIN).

Of course there is. This is easy to code in C using unsigned ints,
without resorting to abs() (yes, I'm aware that abs() is undefined for
INT_MIN).

Regards,
Dean

Aleksander Alekseev

aleksander@timescale.com

almost 3 years ago

In reply to: Dean Rasheed (#5)

Re: to_hex() for negative inputs

Hi Dean,

Of course there is. This is easy to code in C using unsigned ints,
without resorting to abs() (yes, I'm aware that abs() is undefined for
INT_MIN).

So in your opinion what is the expected result of to_hex(INT_MIN,
with_sign => true)?

--
Best regards,
Aleksander Alekseev

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Aleksander Alekseev (#6)

Re: to_hex() for negative inputs

On Wed, 25 Jan 2023 at 10:57, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Of course there is. This is easy to code in C using unsigned ints,
without resorting to abs() (yes, I'm aware that abs() is undefined for
INT_MIN).

So in your opinion what is the expected result of to_hex(INT_MIN,
with_sign => true)?

"-80000000" or "-0x80000000", depending on whether the prefix is
requested. The latter is legal input for an integer, which is the
point (to allow round-tripping):

SELECT int '-0x80000000';
int4
-------------
-2147483648
(1 row)

SELECT pg_typeof(-0x80000000);
pg_typeof
-----------
integer
(1 row)

Regards,
Dean

Aleksander Alekseev

aleksander@timescale.com

almost 3 years ago

In reply to: Dean Rasheed (#7)

Re: to_hex() for negative inputs

Hi Dean,

So in your opinion what is the expected result of to_hex(INT_MIN,
with_sign => true)?

"-80000000" or "-0x80000000", depending on whether the prefix is
requested.

Whether this is the right result is very debatable. 0x80000000 is a
binary representation of -2147483648:

```
(gdb) ptype cur_timeout
type = int
(gdb) p cur_timeout = 0x80000000
$1 = -2147483648
(gdb) p/x cur_timeout
$2 = 0x80000000
```

So what you propose to return is -(-2147483648). For some users this
may be a wanted result, for some it may be not. Personally I would
prefer to get an ERROR in this case. And this is exactly how you end
up with even more flags.

I believe it would be better to let the user write the exact query
depending on what he/she wants.

--
Best regards,
Aleksander Alekseev

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Dean Rasheed (#1)

Re: to_hex() for negative inputs

On 24.01.23 14:10, Dean Rasheed wrote:

I also think it might be useful for it to gain a couple of boolean options:

1). An option to output a signed value (defaulting to false, to
preserve the current two's complement output).

I find the existing behavior so strange, I would rather give up and
invent a whole new function family with correct behavior, which could
then also support octal and binary, and perhaps any base. This could be
something generally named, like "convert" or "format".

2). An option to output the base prefix "0x" (which comes after the
sign, making it inconvenient for the user to add themselves).

This could also be handled with a "format"-like function.

#10

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#9)

Re: to_hex() for negative inputs

On Wed, 25 Jan 2023 at 21:43, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 24.01.23 14:10, Dean Rasheed wrote:

I also think it might be useful for it to gain a couple of boolean options:

1). An option to output a signed value (defaulting to false, to
preserve the current two's complement output).

I find the existing behavior so strange, I would rather give up and
invent a whole new function family with correct behavior, which could
then also support octal and binary, and perhaps any base. This could be
something generally named, like "convert" or "format".

2). An option to output the base prefix "0x" (which comes after the
sign, making it inconvenient for the user to add themselves).

This could also be handled with a "format"-like function.

Yeah, making a break from the existing to_hex() functions might not be
a bad idea.

My only concern with something general like convert() or format() is
that it'll end up being hard to use, with lots of different formatting
options, like to_char(). In fact Oracle's to_char() has an 'X'
formatting option to output hexadecimal, though it's pretty limited
(e.g., it doesn't support negative inputs). MySQL has conv() that'll
convert between any two bases, but it does bizarre things for negative
inputs or inputs larger than 2^64.

TBH, I'm not that interested in bases other than hex (I mean who
really uses octal anymore?), and I don't particularly care about
formats other than the format we accept as input. For that, just
adding a numeric_to_hex() function would suffice. That works fine for
int4 and int8 inputs too, and doesn't preclude adding a more general
base-conversion / formatting function later, if there's demand.

Regards,
Dean