BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

Started by PG Bug reporting formabout 3 years ago4 messagesbugs

noreply@postgresql.org

about 3 years ago

The following bug has been logged on the website:

Bug reference: 17795
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 15.2
Operating system: Ubuntu 22.04
Description:

I've encountered a pair of relatively new anomalies when using ISO-8601
intervals.
Firstly, a float value passed as a component of the interval
can produce an overflow (I performed the following exercises with clang 14
and gcc 11.3).
Let's begin with an integer component:
select interval 'P178956970Y';
178956970 years -- OK (178956970 * 12 == 2 ^ 31 - 8)
select interval 'P178956971Y';
interval out of range -- OK (178956971 * 12 == 2 ^ 31 + 4)

Compare with a float value:
select interval 'P.178956970e9Y';
178956970 years -- OK
And:
select interval 'P.178956970e9Y6M';
178956970 years 6 mons -- OK
But:
select interval 'P.178956971e9Y';
-178956970 years -8 mons -- not OK, previously: interval out of range

Though:
select interval 'P.178956970e9Y8M';
interval field value out of range -- OK
But:
select interval 'P.178956971e9Y8M';
-178956970 years -- not OK, previously: interval out of range

As I can see, this is explained by the following code:
int extra_months = (int) rint(frac * scale * MONTHS_PER_YEAR);
return !pg_add_s32_overflow(itm_in->tm_mon, extra_months,
&itm_in->tm_mon);

Here, when calculating extra_months with an out-of-range float,
we get the sentinel value: -2147483648 == -0x80000000.
And then pg_add_s32_overflow(0, -0x80000000, ...) (internally
__builtin_add_overflow() for me)
happily returns "no overflow".
For the case 'P.178956970e9Y8M' an overflow detected because extra_months
is
near the upper limit, but is not the sentinel value.
(BTW, clang' ubsan doesn't like conversion
"(int)out_of_int_range_number".)

And for the sake of completeness, the upper limit of the double type:
select interval 'P.17976931348623158e309Y';
-178956969 years -8 mons -- not OK, previously: interval field value out of
range
select interval 'P.17976931348623159e309Y';
invalid input syntax for type interval -- OK (out of the double range)

Here "previously" is the behavior observed on e39f99046~1.

The second anomaly is with parsing float values with an integer part:
select interval 'P.0e100Y';
00:00:00 -- OK
But:
select interval 'P1.0e100Y';
1 year -- previously: invalid input syntax for type interval
select interval 'P1.0e-100Y';
1 year -- previously: invalid input syntax for type interval

IIUC, the integer part is just added to the result of parsing a rest with
the scientific notation.

Similar anomaly can be seen with the D part:
select interval 'P.1e9D';
2400000000:00:00 -- OK, previously: 100000000 days
But
select interval 'P.1e10D';
-2562047788:00:54.775807 -- not OK, previously: 1000000000 days

select interval 'P1.1e9D';
1 day 2400000000:00:00 -- not OK, previously: 1100000000 days

Probably all these anomalies can be eliminated by disabling
the scientific notation (though it more or less worked previously),
just as it is not supported for ordinary (not ISO-8601) intervals.

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

PG Bug reporting form <noreply@postgresql.org> writes:

I've encountered a pair of relatively new anomalies when using ISO-8601
intervals.

Thanks for the report! All of these seem to trace to the fact that
ParseISO8601Number isn't considering the possibility that the input
string ends with "eNN". This allows the reported "fraction" output
to have values greater than 1, which breaks some calling code as you
show, and it also fails to apply that scale factor to the integer
part of the output, which accounts for some of the other cases.

On the whole I think the right fix is to go back to using strtod()
to parse the whole input string, as we did before e39f99046. This
results in some inaccuracy if there's more than 15 digits in the
input, but I don't care enough about that scenario to work harder.

There are a couple of other places where datetime.c is using strtod()
to parse what it expects to be a fractional value. I'm inclined to
just barf if the result is out of range. Maybe we shouldn't back-patch
that aspect into v15, not sure.

Anyway I propose the attached.

regards, tom lane

Alexander Lakhin

exclusion@gmail.com

about 3 years ago

In reply to: Tom Lane (#2)

Re: BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

20.02.2023 02:40, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

I've encountered a pair of relatively new anomalies when using ISO-8601
intervals.

On the whole I think the right fix is to go back to using strtod()
to parse the whole input string, as we did before e39f99046. This
results in some inaccuracy if there's more than 15 digits in the
input, but I don't care enough about that scenario to work harder.

This approach seems less fragile to me. All my examples shown before
work as expected now.
Thanks!

There are a couple of other places where datetime.c is using strtod()
to parse what it expects to be a fractional value. I'm inclined to
just barf if the result is out of range. Maybe we shouldn't back-patch
that aspect into v15, not sure.

I could not find a way to pass the e-notation there.
Looking at the callers of the ParseFraction() and DecodeNumberField()
I see only DecodeDate(), DecodeDateTime(), DecodeTimeOnly(),
and DecodeInterval(). All these functions pass to ParseFraction()
and DecodeNumberField() not full "fractional" strings, but fields, that
were extracted before. For example, with "SELECT time 'J0.5e1';"
there are three fields "j", "0.5", "e1" passed to DecodeTimeOnly(),
or with "SELECT interval '1.5e1';" I see fields "1.5" and "e1".
(Maybe I miss something.)

Also, the comment added makes me wonder, whether ".1e-10"
(e.g., in "1.1e-10") can be considered bogus too. I would say so
(if we are going to just add a fraction produced by ParseFraction()
to some integer part later, we still get the wrong result
(for the scientific notation)), and if we want to have
a consistent syntax, maybe it's worth to check an input string for 'e'/'E',
but if not, then maybe leave it alone. I would prefer the latter for now.

Best regards,
Alexander

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Alexander Lakhin (#3)

Re: BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

Alexander Lakhin <exclusion@gmail.com> writes:

20.02.2023 02:40, Tom Lane wrote:

There are a couple of other places where datetime.c is using strtod()
to parse what it expects to be a fractional value. I'm inclined to
just barf if the result is out of range. Maybe we shouldn't back-patch
that aspect into v15, not sure.

I could not find a way to pass the e-notation there.

True, I was thinking of this as future-proofing not as closing any
live loophole.

Also, the comment added makes me wonder, whether ".1e-10"
(e.g., in "1.1e-10") can be considered bogus too. I would say so
(if we are going to just add a fraction produced by ParseFraction()
to some integer part later, we still get the wrong result
(for the scientific notation)), and if we want to have
a consistent syntax, maybe it's worth to check an input string for 'e'/'E',
but if not, then maybe leave it alone. I would prefer the latter for now.

Fair enough. I pushed the ParseISO8601Number fix without
touching the other two places.

regards, tom lane

BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

Attachments: