No title

Started by Vitaly Burovoyabout 10 years ago5 messageshackers
Jump to latest
#1Vitaly Burovoy
vitaly.burovoy@gmail.com

Hackers,

I've just found a little bug: extracting "epoch" from the last 30
years before Postgres' "+Infinity" leads an integer overflow:

postgres=# SELECT x::timestamptz, extract(epoch FROM x::timestamptz)
postgres-# FROM
postgres-# (VALUES
postgres(# ('294247-01-10 04:00:54.775805'),
postgres(# ('294247-01-10 04:00:55.775806'),
postgres(# ('294277-01-09 04:00:54.775806'), -- the last value before 'Inf'
postgres(# ('294277-01-09 04:00:54.775807') -- we've discussed, it
should be fixed
postgres(# ) as t(x);
x | date_part
---------------------------------+-------------------
294247-01-10 04:00:54.775805+00 | 9223372036854.78
294247-01-10 04:00:55.775806+00 | -9223372036853.78
294277-01-09 04:00:54.775806+00 | -9222425352054.78
infinity | Infinity
(4 rows)

With the attached patch it becomes positive:
x | date_part
---------------------------------+------------------
294247-01-10 04:00:54.775805+00 | 9223372036854.78
294247-01-10 04:00:55.775806+00 | 9223372036855.78
294277-01-09 04:00:54.775806+00 | 9224318721654.78
infinity | Infinity
(4 rows)

--
Best regards,
Vitaly Burovoy

Attachments:

fix_extract_overflow_v1.patchapplication/octet-stream; name=fix_extract_overflow_v1.patchDownload+16-2
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vitaly Burovoy (#1)
Re: Integer overflow in timestamp_part()

[ Please use a useful Subject: line in your posts. ]

Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:

I've just found a little bug: extracting "epoch" from the last 30
years before Postgres' "+Infinity" leads an integer overflow:

Hmm. I do not like the proposed patch much: it looks like it's
throwing away precision too soon, although given that the result of
SetEpochTimestamp can be cast to float exactly, maybe it doesn't matter.

More importantly, I seriously doubt that this is the only issue
for timestamps very close to the INT64_MAX boundary. An example is
that we're not rejecting values that would correspond to DT_NOBEGIN
or DT_NOEND:

regression=# set timezone = 'PST8PDT';
SET
regression=# select '294277-01-08 20:00:54.775806-08'::timestamptz;
timestamptz
---------------------------------
294277-01-08 20:00:54.775806-08
(1 row)

regression=# select '294277-01-08 20:00:54.775807-08'::timestamptz;
timestamptz
-------------
infinity
(1 row)

regression=# select '294277-01-08 20:00:54.775808-08'::timestamptz;
timestamptz
-------------
-infinity
(1 row)

regression=# select '294277-01-08 20:00:54.775809-08'::timestamptz;
ERROR: timestamp out of range

Worse yet, that last error is coming from timestamptz_out, not
timestamptz_in; we're accepting a value we cannot store properly.
The converted value has actually overflowed to be equal to
INT64_MIN+1, and then timestamptz_out barfs because it's before
Julian day 0. Other operations would incorrectly interpret it
as a date in the very far past. timestamptz_in doesn't throw an
error until several hours later than this; it looks like the
problem is that tm2timestamp() worries about overflow in initially
calculating the converted value, but not about overflow in the
dt2local() rotation, and in any case it doesn't worry about not
producing DT_NOEND.

I'm inclined to think that a good solution would be to create an
artificial restriction to not accept years beyond, say, 100000 AD.
That would leave us with a lot of daylight to not have to worry
about corner-case overflows in timestamp arithmetic. I'm not sure
though where we'd need to enforce such a restriction; certainly in
timestamp[tz]_in, but where else?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#2)
Re: Integer overflow in timestamp_part()

On 2/2/16 6:39 PM, Tom Lane wrote:

I'm inclined to think that a good solution would be to create an
artificial restriction to not accept years beyond, say, 100000 AD.
That would leave us with a lot of daylight to not have to worry
about corner-case overflows in timestamp arithmetic. I'm not sure
though where we'd need to enforce such a restriction; certainly in
timestamp[tz]_in, but where else?

Probably some of the casts (I'd think at least timestamp->timestamptz).
Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Vitaly Burovoy
vitaly.burovoy@gmail.com
In reply to: Tom Lane (#2)
Re: Integer overflow in timestamp_part()

On 2/2/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ Please use a useful Subject: line in your posts. ]

I'm so sorry, it was the first time I had forgotten to look at the
"Subject" field before I pressed the "Send" button.

Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:

I've just found a little bug: extracting "epoch" from the last 30
years before Postgres' "+Infinity" leads an integer overflow:

Hmm. I do not like the proposed patch much: it looks like it's
throwing away precision too soon, although given that the result of
SetEpochTimestamp can be cast to float exactly, maybe it doesn't matter.

More importantly, I seriously doubt that this is the only issue
for timestamps very close to the INT64_MAX boundary. An example is
that we're not rejecting values that would correspond to DT_NOBEGIN
or DT_NOEND:

regression=# set timezone = 'PST8PDT';
SET
regression=# select '294277-01-08 20:00:54.775806-08'::timestamptz;
timestamptz
---------------------------------
294277-01-08 20:00:54.775806-08
(1 row)

regression=# select '294277-01-08 20:00:54.775807-08'::timestamptz;
timestamptz
-------------
infinity
(1 row)

regression=# select '294277-01-08 20:00:54.775808-08'::timestamptz;
timestamptz
-------------
-infinity
(1 row)

regression=# select '294277-01-08 20:00:54.775809-08'::timestamptz;
ERROR: timestamp out of range

Worse yet, that last error is coming from timestamptz_out, not
timestamptz_in; we're accepting a value we cannot store properly.
The converted value has actually overflowed to be equal to
INT64_MIN+1, and then timestamptz_out barfs because it's before
Julian day 0. Other operations would incorrectly interpret it
as a date in the very far past. timestamptz_in doesn't throw an
error until several hours later than this; it looks like the
problem is that tm2timestamp() worries about overflow in initially
calculating the converted value, but not about overflow in the
dt2local() rotation, and in any case it doesn't worry about not
producing DT_NOEND.

It is clear why it happens, and it was in my plans to insert checks
there according to the thread[1]/messages/by-id/CAKOSWNmLjs_2EyS4z_bDkp=RfU8s2M2P8joZVt18h+wfszWz7A@mail.gmail.com.

I'm inclined to think that a good solution would be to create an
artificial restriction to not accept years beyond, say, 100000 AD.

Well... We can limit it to the boundaries described at the
documentation page[2]http://www.postgresql.org/docs/devel/static/datatype-datetime.html: [4713 BC, 294276 AD].
It allows us be sure we will not break stamps that are stored (for any
reason) according to the documentation (in meaning of infinity, but
not exactly as 'infinity'::timestamptz).

Currently boundaries for timestamp[tz] are [4714-11-24+00 BC,
294277-01-09 04:00:54.775806+00]. One month to the lower boundary and
9 days to the upper one should be enough to represent it into int64
before applying time zone (+-15 hours) and check for boundaries
without an overflow.

That would leave us with a lot of daylight to not have to worry
about corner-case overflows in timestamp arithmetic.

Great! It was my next question because I desperated to find a solution
for finding a good corner-case for internal version of the
"to_timestamp" function (my not published yet WIP patch) that supports
+-Infinity::float8 as input to be symmertric with current
"extract('epoch'..."[3]http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a.

The exact value for current allowed maximal value ("294277-01-09
04:00:54.775806+00") should be 9224318721654.775806, but it cannot be
represented as float8. My experiments show there is "big" gap in
miliseconds (~0.002, but not 0.000001):

src | representation
----------------------+------------------------
9224318721654.774414 | 9224318721654.7734375
9224318721654.774415 | 9224318721654.775390625
9224318721654.776367 | 9224318721654.775390625
9224318721654.776368 | 9224318721654.77734375
9224318721654.778320 | 9224318721654.77734375
9224318721654.778321 | 9224318721654.779296875

So if it is possible to set an upper limit exact by a year boundary it
solves a lot of nerves.

I'm not sure
though where we'd need to enforce such a restriction; certainly in
timestamp[tz]_in, but where else?

regards, tom lane

What to do with dates: [4713 BC, 5874897 AD]? Limit them to stamps
boundaries or leave them as is and forbid a conversion if they don't
fit into stamps?

There is also trouble with intervals. Currently it is impossible to do
(even without the overflow on extracting):

postgres=# select extract (epoch from '294277-01-09
04:00:54.775806+00'::timestamptz);
date_part
------------------
9224318721654.78
(1 row)

postgres=# select to_timestamp(9224318721654.775390625); -- because of
..654.78 is rounded up; but we know the exact value
to_timestamp
---------------------------------
294277-01-09 04:00:54.77539+00
(1 row)

... you get the error:

postgres=# select to_timestamp(9224318721654.775390625);
ERROR: interval out of range
CONTEXT: SQL function "to_timestamp" statement 1

... at the operation "$1 * '1 second'::pg_catalog.interval" in the
pg_catalog.to_timestamp function:

postgres=# select 9224318721654.775390625 * '1 second'::pg_catalog.interval;
ERROR: interval out of range

... even with a valid 64bit signed integer:

postgres=# select 9223372036000 * '1 second'::pg_catalog.interval;
ERROR: interval out of range

postgres=# select 7730941132799.99951 * '1 second'::interval as
max_interval_sec, to_timestamp(7730941132799.99951);
max_interval_sec | to_timestamp
-------------------------+---------------------------------
2147483647:59:59.998976 | 246953-10-09 07:59:59.999023+00
(1 row)

despite the fact the documentation[2]http://www.postgresql.org/docs/devel/static/datatype-datetime.html defines a range of intervals as
[-178000000,178000000] years (that doesn't fit even into dates) with
resolution "1 microsecond / 14 digits"... I know the cause
((7730941132799+1)/3600 = 2^31), but it seems lack of a Note in the
documentation.

===
[1]: /messages/by-id/CAKOSWNmLjs_2EyS4z_bDkp=RfU8s2M2P8joZVt18h+wfszWz7A@mail.gmail.com
[2]: http://www.postgresql.org/docs/devel/static/datatype-datetime.html
[3]: http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Bruce Momjian
bruce@momjian.us
In reply to: Vitaly Burovoy (#1)
Re:

On Sat, Jan 30, 2016 at 11:06:10PM -0800, Vitaly Burovoy wrote:

Hackers,

I've just found a little bug: extracting "epoch" from the last 30
years before Postgres' "+Infinity" leads an integer overflow:

postgres=# SELECT x::timestamptz, extract(epoch FROM x::timestamptz)
postgres-# FROM
postgres-# (VALUES
postgres(# ('294247-01-10 04:00:54.775805'),
postgres(# ('294247-01-10 04:00:55.775806'),
postgres(# ('294277-01-09 04:00:54.775806'), -- the last value before 'Inf'
postgres(# ('294277-01-09 04:00:54.775807') -- we've discussed, it
should be fixed
postgres(# ) as t(x);
x | date_part
---------------------------------+-------------------
294247-01-10 04:00:54.775805+00 | 9223372036854.78
294247-01-10 04:00:55.775806+00 | -9223372036853.78
294277-01-09 04:00:54.775806+00 | -9222425352054.78
infinity | Infinity
(4 rows)

With the attached patch it becomes positive:
x | date_part
---------------------------------+------------------
294247-01-10 04:00:54.775805+00 | 9223372036854.78
294247-01-10 04:00:55.775806+00 | 9223372036855.78
294277-01-09 04:00:54.775806+00 | 9224318721654.78
infinity | Infinity
(4 rows)

FYI, in 9.6 this will return an error:

test=> SELECT x::timestamptz, extract(epoch FROM x::timestamptz)
FROM
(VALUES
('294247-01-10 04:00:54.775805'),
('294247-01-10 04:00:55.775806'),
('294277-01-09 04:00:54.775806'), -- the last value before 'Inf'
('294277-01-09 04:00:54.775807') -- we've discussed, it
) as t(x);
ERROR: timestamp out of range: "294277-01-09 04:00:54.775806"

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers