Fix formatting of Interval output

Started by Joseph Koshakowover 4 years ago6 messageshackers

koshy44@gmail.com

over 4 years ago

When formatting the output of an Interval, we call abs() on the hours
field of the Interval. Calling abs(INT_MIN) returns back INT_MIN
causing the output to contain two '-' characters. The attached patch
fixes that issue by special casing INT_MIN hours.

Here is an example of the issue:
postgres=# SELECT INTERVAL '-2147483648 hrs';
interval
--------------------
--2147483648:00:00
(1 row)

Cheers,
Joe Koshakow

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Joseph Koshakow (#1)

Re: Fix formatting of Interval output

Joseph Koshakow <koshy44@gmail.com> writes:

When formatting the output of an Interval, we call abs() on the hours
field of the Interval. Calling abs(INT_MIN) returns back INT_MIN
causing the output to contain two '-' characters. The attached patch
fixes that issue by special casing INT_MIN hours.

Good catch, but it looks to me like three out of the four formats in
EncodeInterval have variants of this problem --- there are assumptions
throughout that code that we can compute "-x" or "abs(x)" without
fear. Not much point in fixing only one symptom.

Also, I notice that there's an overflow hazard upstream of here,
in interval2tm:

regression=# select interval '214748364 hours' * 11;
ERROR: interval out of range
regression=# \errverbose
ERROR: 22008: interval out of range
LOCATION: interval2tm, timestamp.c:1982

There's no good excuse for not being able to print a value that
we computed successfully.

I wonder if the most reasonable fix would be to start using int64
instead of int arithmetic for the values that are potentially large.
I doubt that we'd be taking much of a performance hit on modern
hardware.

regards, tom lane

Joseph Koshakow

koshy44@gmail.com

over 4 years ago

In reply to: Tom Lane (#2)

Re: Fix formatting of Interval output

On Fri, Feb 18, 2022 at 11:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joseph Koshakow <koshy44@gmail.com> writes:

When formatting the output of an Interval, we call abs() on the hours
field of the Interval. Calling abs(INT_MIN) returns back INT_MIN
causing the output to contain two '-' characters. The attached patch
fixes that issue by special casing INT_MIN hours.

Good catch, but it looks to me like three out of the four formats in
EncodeInterval have variants of this problem --- there are assumptions
throughout that code that we can compute "-x" or "abs(x)" without
fear. Not much point in fixing only one symptom.

Also, I notice that there's an overflow hazard upstream of here,
in interval2tm:

regression=# select interval '214748364 hours' * 11;
ERROR: interval out of range
regression=# \errverbose
ERROR: 22008: interval out of range
LOCATION: interval2tm, timestamp.c:1982

There's no good excuse for not being able to print a value that
we computed successfully.

I wonder if the most reasonable fix would be to start using int64
instead of int arithmetic for the values that are potentially large.
I doubt that we'd be taking much of a performance hit on modern
hardware.

regards, tom lane

That's an interesting idea. I've always assumed that the range of the
time fields of Intervals was 2147483647 hours 59 minutes
59.999999 seconds to -2147483648 hours -59 minutes
-59.999999 seconds. However the only reason that we can't support
the full range of int64 microseconds is because the struct pg_tm fields
are only ints. If we increase those fields to int64 then we'd be able to
support the full int64 range for microseconds as well as implicitly fix
some of the overflow issues in DecodeInterval and EncodeInterval.

- Joe Koshakow

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Joseph Koshakow (#3)

Re: Fix formatting of Interval output

Joseph Koshakow <koshy44@gmail.com> writes:

On Fri, Feb 18, 2022 at 11:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder if the most reasonable fix would be to start using int64
instead of int arithmetic for the values that are potentially large.
I doubt that we'd be taking much of a performance hit on modern
hardware.

That's an interesting idea. I've always assumed that the range of the
time fields of Intervals was 2147483647 hours 59 minutes
59.999999 seconds to -2147483648 hours -59 minutes
-59.999999 seconds. However the only reason that we can't support
the full range of int64 microseconds is because the struct pg_tm fields
are only ints. If we increase those fields to int64 then we'd be able to
support the full int64 range for microseconds as well as implicitly fix
some of the overflow issues in DecodeInterval and EncodeInterval.

I think that messing with struct pg_tm might have too many side-effects.
However, pg_tm isn't all that well adapted to intervals in the first
place, so it'd make sense to make a new struct specifically for interval
decoding.

regards, tom lane

Joseph Koshakow

koshy44@gmail.com

over 4 years ago

In reply to: Tom Lane (#4)

Re: Fix formatting of Interval output

On Sat, Feb 19, 2022 at 12:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think that messing with struct pg_tm might have too many side-effects.
However, pg_tm isn't all that well adapted to intervals in the first
place, so it'd make sense to make a new struct specifically for interval
decoding.

Yeah that's a good idea, pg_tm is used all over the place. Is there a
reason we need an intermediate structure to convert to and from?
We could just convert directly to an Interval in DecodeInterval and
print directly from an Interval in EncodeInterval.

Also I originally created a separate thread and patch because I
thought this wouldn't be directly related to my other patch,
https://commitfest.postgresql.org/37/3541/. However I think with these
discussed changes it's directly related. So I think it's best to close
this thread and patch and copy this conversion to the other thread.
Sorry for fragmenting the conversation.

- Joe Koshakow

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Joseph Koshakow (#5)

Re: Fix formatting of Interval output

Joseph Koshakow <koshy44@gmail.com> writes:

On Sat, Feb 19, 2022 at 12:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think that messing with struct pg_tm might have too many side-effects.
However, pg_tm isn't all that well adapted to intervals in the first
place, so it'd make sense to make a new struct specifically for interval
decoding.

Yeah that's a good idea, pg_tm is used all over the place. Is there a
reason we need an intermediate structure to convert to and from?
We could just convert directly to an Interval in DecodeInterval and
print directly from an Interval in EncodeInterval.

Yeah, I thought about that too, but if you look at the other callers of
interval2tm, you'll see this same set of issues. I'm inclined to keep
the current code structure and just fix the data structure. Although
interval2tm isn't *that* big, I don't think four copies of its logic
would be an improvement.

Also I originally created a separate thread and patch because I
thought this wouldn't be directly related to my other patch,
https://commitfest.postgresql.org/37/3541/. However I think with these
discussed changes it's directly related. So I think it's best to close
this thread and patch and copy this conversion to the other thread.

Agreed.

regards, tom lane

Fix formatting of Interval output

Attachments: