Floating-point timestamps versus Range Types

Started by Jeff Davisover 15 years ago29 messageshackers
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

I'm working on the design for Range Types for 9.1:

http://wiki.postgresql.org/wiki/RangeTypes

But I think that floating-point timestamps may pose a problem. In this
thread:

http://archives.postgresql.org/pgsql-bugs/2010-08/msg00378.php

I pointed out that floating-point timestamps can become a different
value when dumped and reloaded (that is, the type input and output
functions are not perfect inverses).

This is a problem now, in theory, with UNIQUE indexes because the data
might be inserted once, but then a dump/reload may violate the
constraint. But it's not very common to include a high-precision
timestamp in a UNIQUE index, so it's not much of a practical problem.

However, with Exclusion Constraints and Range Types, I expect this
problem may become a practical problem, because Exclusion Constraints
are designed to work with things like timestamps. Also, with Range
Types, I expect people to generally care more about the precise
boundaries; so basic sanity (like input and output functions that are
inverses) seems more important.

What should be done? I see a few options:

1. Do nothing. Floating-point timestamps aren't the default, and the bug
reports are likely to be few and far between (but those that encounter
the bug are likely to be very frustrated).

2. Fix the input/output functions in a special mode for dump/reload, to
make them true inverses. This isn't easy to do by just adding extra
digits. But perhaps we could hack up the text format to have a special
escape that is really an encoded binary representation, or maybe
resembling float8.

3. Somehow deprecate floating point timestamps or make them unusable in
conjunction with Range Types. I'm not sure if there is demand to keep
them alive or not.

Thoughts?

Regards,
Jeff Davis

#2David E. Wheeler
david@kineticode.com
In reply to: Jeff Davis (#1)
Re: Floating-point timestamps versus Range Types

On Oct 17, 2010, at 9:56 AM, Jeff Davis wrote:

3. Somehow deprecate floating point timestamps or make them unusable in
conjunction with Range Types. I'm not sure if there is demand to keep
them alive or not.

+1

David

#3Joshua D. Drake
jd@commandprompt.com
In reply to: David E. Wheeler (#2)
Re: Floating-point timestamps versus Range Types

On Sun, 2010-10-17 at 10:00 -0700, David E. Wheeler wrote:

On Oct 17, 2010, at 9:56 AM, Jeff Davis wrote:

3. Somehow deprecate floating point timestamps or make them unusable in
conjunction with Range Types. I'm not sure if there is demand to keep
them alive or not.

This seems the best solution. We finally got around to making
--integer-datetimes the default in 8.4.

The only major distribution that I know of that ships the deprecated
configuration is RedHat/Fedora. I don't know when that will change.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#1)
Re: Floating-point timestamps versus Range Types

Jeff Davis <pgsql@j-davis.com> writes:

What should be done? I see a few options:

1. Do nothing. Floating-point timestamps aren't the default, and the bug
reports are likely to be few and far between (but those that encounter
the bug are likely to be very frustrated).

I'm for that one. Anybody working with fractional float timestamps
should already understand that they aren't exact. I can't see the value
of expending any great amount of effort on this.

There is maybe some argument for removing the float timestamp code
altogether, but I think that that's probably premature. They were
still the default in 8.3, and we are still supporting in-place upgrade
from 8.3.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#3)
Re: Floating-point timestamps versus Range Types

"Joshua D. Drake" <jd@commandprompt.com> writes:

The only major distribution that I know of that ships the deprecated
configuration is RedHat/Fedora. I don't know when that will change.

Red Hat switched to integer datetimes as of 8.4 ... just like upstream.
Please don't imagine that you can complain that Red Hat is behind the
curve on that.

regards, tom lane

#6Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#4)
Re: Floating-point timestamps versus Range Types

On Sun, 2010-10-17 at 16:17 -0400, Tom Lane wrote:

I'm for that one. Anybody working with fractional float timestamps
should already understand that they aren't exact. I can't see the value
of expending any great amount of effort on this.

OK.

There is maybe some argument for removing the float timestamp code
altogether, but I think that that's probably premature. They were
still the default in 8.3, and we are still supporting in-place upgrade
from 8.3.

Regarding Josh Drake's comment, do you have any insight about when
Redhat will start to ship with integer timestamps? That seems like the
determining factor for when we can deprecate floating-point timestamps.

Regards,
Jeff Davis

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#6)
Re: Floating-point timestamps versus Range Types

Jeff Davis <pgsql@j-davis.com> writes:

On Sun, 2010-10-17 at 16:17 -0400, Tom Lane wrote:

There is maybe some argument for removing the float timestamp code
altogether, but I think that that's probably premature. They were
still the default in 8.3, and we are still supporting in-place upgrade
from 8.3.

Regarding Josh Drake's comment, do you have any insight about when
Redhat will start to ship with integer timestamps? That seems like the
determining factor for when we can deprecate floating-point timestamps.

At the earliest, we could consider dropping them when we drop support
for in-place upgrade from 8.3 --- not only direct upgrade, but through
multiple pg_upgrade steps. That's assuming that we think there are
no users who are depending on float timestamps for functionality (they
have a wider range than int timestamps don't they?). I don't believe
that Red Hat's choices enter into this in the slightest: they aren't
doing anything different from users who compile from source.

Anyway the short answer seems to be that we can consider dropping them
when we next break on-disk compatibility.

regards, tom lane

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#7)
Re: Floating-point timestamps versus Range Types

On 10/17/2010 04:40 PM, Tom Lane wrote:

At the earliest, we could consider dropping them when we drop support
for in-place upgrade from 8.3 --- not only direct upgrade, but through
multiple pg_upgrade steps. That's assuming that we think there are
no users who are depending on float timestamps for functionality (they
have a wider range than int timestamps don't they?).

Yes, they do.

Maybe we need to look at providing a bigtimestamp type or similar at
some stage. Or maybe the demand for it would be so low it should be an
add-on module.

cheers

andrew

#9Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#5)
Re: Floating-point timestamps versus Range Types

On Sun, 2010-10-17 at 16:27 -0400, Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

The only major distribution that I know of that ships the deprecated
configuration is RedHat/Fedora. I don't know when that will change.

Red Hat switched to integer datetimes as of 8.4 ... just like upstream.
Please don't imagine that you can complain that Red Hat is behind the
curve on that.

I wasn't imagining please don't get your boxers in a bunch. I was
misinformed and I apologize.

JD

regards, tom lane

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#10Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#3)
Re: Floating-point timestamps versus Range Types

On Sun, Oct 17, 2010 at 12:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

The only major distribution that I know of that ships the deprecated
configuration is RedHat/Fedora. I don't know when that will change.

If only we knew someone in Redhat :)

iirc the issue was binary upgrades. So I suspect the answer will be
"the next time we break pg_upgrade and require a dump/reload". This is
the cost we chose to pay for binary upgrades.

--
greg

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#10)
Re: Floating-point timestamps versus Range Types

On Sun, 2010-10-17 at 15:52 -0700, Greg Stark wrote:

On Sun, Oct 17, 2010 at 12:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

The only major distribution that I know of that ships the deprecated
configuration is RedHat/Fedora. I don't know when that will change.

If only we knew someone in Redhat :)

Fortunately for those in the know, you can get --integer-datetimes with
Red Hat. My information was a little outdated.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
Re: Floating-point timestamps versus Range Types

Andrew Dunstan <andrew@dunslane.net> writes:

On 10/17/2010 04:40 PM, Tom Lane wrote:

... That's assuming that we think there are
no users who are depending on float timestamps for functionality (they
have a wider range than int timestamps don't they?).

Yes, they do.

Maybe we need to look at providing a bigtimestamp type or similar at
some stage. Or maybe the demand for it would be so low it should be an
add-on module.

[ rechecks the code & docs ... ] In principle float timestamps could
have a ridiculously wide range, on the order of 140 million years if
you assume that 1-second precision is sufficient. In practice they are
constrained by our use of nonnegative 32-bit integers for Julian Day
counts, which restricts the range to be from 4713 BC to 5 million years
and change AD. 64-bit-int timestamps have a theoretical range of about
plus or minus 300 thousand years, which again is restricted on the BC
side by the Julian Day code. We could push out the 5M AD limit by
converting the JD code to 64-bit ints, but it's not clear there's any
interest in that given that it won't do a thing for the integer
timestamp case (and I'm not sure if the equations are really correct
so far out, anyway).

So the bottom line question is whether somebody has a use for Gregorian
calendar dates between 300K AD and 5M AD, while not needing to go back
before 4K BC. I should think that the BC-side limit pretty much renders
this datatype pointless for astronomers and geologists, even if they
wanted to count in Gregorian dates; and I can't think of any other
communities that are going to care much about dates that far out.
So, if there's a use-case at all, it's not interesting enough to include
in core.

IOW I don't think the range argument holds much water for keeping float
timestamps alive. The on-disk-compatibility argument does, though.

regards, tom lane

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#12)
Re: Floating-point timestamps versus Range Types

On Mon, Oct 18, 2010 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

IOW I don't think the range argument holds much water for keeping float
timestamps alive.  The on-disk-compatibility argument does, though.

Right. I think your argument that we should "do nothing" upthread is
exactly right. Deprecating float timestamps doesn't solve any real
problem. As of today, we can assume that anyone who is still using
float timestamps is doing so because they are doing in-place upgrade
from an older version. If we do nothing, the worst thing that can
possibly happen is that MAYBE they will have some difficulties if they
use floating timestamps in combination with the range types Jeff is
proposing to implement. Or, we can remove integer date time support
and categorically prevent them from using pg_upgrade whether they care
about range types or not, and whether they actually would have
experienced problems with them or not. AFAICS, that's just being
unfriendly to no purpose.

A more interesting question is whether and how we can ease the
migration path from float timestamps to integer timestamps. Even
without range types, if someone does have a UNIQUE index on a
timestamp column, could they get an error if they dump from a
float-timestamp version of PG and restore onto an integer-timestamp
version? How would we recommend that they recover from that
situation?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#13)
Re: Floating-point timestamps versus Range Types

On Mon, 2010-10-18 at 14:06 -0400, Robert Haas wrote:

Right. I think your argument that we should "do nothing" upthread is
exactly right.

OK.

A more interesting question is whether and how we can ease the
migration path from float timestamps to integer timestamps. Even
without range types, if someone does have a UNIQUE index on a
timestamp column, could they get an error if they dump from a
float-timestamp version of PG and restore onto an integer-timestamp
version?

Yes. They could also get an error if they dump from a float-timestamp
version and restore into a float-timestamp version. [ That's because for
float-timestamps typinput(typoutput(VALUE)) may not equal VALUE. ]

How would we recommend that they recover from that
situation?

COPY the data out in binary mode, or they have already lost data (if
using float-timestamps).

A reasonable conversion path might be to offer integer timestamps using
a different type name (e.g. inttimestamp) that always means integer
timestamps. Then, they could convert using ALTER TABLE, then do an
in-place upgrade. We could even make pg_upgrade optionally convert
inttimestamp to timestamp in O(1) on an integer-timestamps build.

Regards,
Jeff Davis

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#13)
Re: Floating-point timestamps versus Range Types

Robert Haas <robertmhaas@gmail.com> writes:

A more interesting question is whether and how we can ease the
migration path from float timestamps to integer timestamps. Even
without range types, if someone does have a UNIQUE index on a
timestamp column, could they get an error if they dump from a
float-timestamp version of PG and restore onto an integer-timestamp
version?

In principle yes, but I think the risk is pretty hypothetical.
Currently (2010, ten years out from the internal epoch) the effective
resolution of IEEE-float-based timestamps is about a tenth of a
microsecond. Thus for example, on 8.3 I get

regression=# select '2010-10-18 14:35:14.6164431-04'::timestamptz = '2010-10-18 14:35:14.6164432-04'::timestamptz;
?column?
----------
f
(1 row)

regression=# select '2010-10-18 14:35:14.6164431-04'::timestamptz = '2010-10-18 14:35:14.6164431-04'::timestamptz;
?column?
----------
t
(1 row)

regression=# select '2010-10-18 14:35:14.6164431-04'::timestamptz = '2010-10-18 14:35:14.61644311-04'::timestamptz;
?column?
----------
t
(1 row)

whereas an int-timestamp build sees these inputs as all the same.
Thus, to get into trouble you'd need to have a unique index on data that
conflicts at the microsecond scale but not at the tenth-of-a-microsecond
scale. This seems implausible. In particular, you didn't get any such
data from now(), which relies on Unix APIs that don't go below
microsecond precision. You might conceivably have entered such data
externally, as I did above, but you'd have to not notice/care that it
wasn't coming back out at the same precision. And you'd have to never
have dumped/reloaded using pg_dump, or the low order digits would have
vanished already. And you'd have to not be dealing with data outside
a range of roughly 1900-2100, or the precision of floats would actually
be worse than ints.

So the argument seems academic to me ...

regards, tom lane

#16Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#14)
Re: Floating-point timestamps versus Range Types

On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:

A reasonable conversion path might be to offer integer timestamps using
a different type name (e.g. inttimestamp) that always means integer
timestamps. Then, they could convert using ALTER TABLE, then do an
in-place upgrade. We could even make pg_upgrade optionally convert
inttimestamp to timestamp in O(1) on an integer-timestamps build.

I think in retrospect it would certainly have been better to make
integer timestamps and float timestamps two separate data types,
rather than two versions of the same data type. Whether it's worth
providing that now after the fact is not clear to me. I'd be inclined
to wait and see whether we get many complaints...

One problem with changing types in pg_upgrade is that type OIDs can
get embedded in the on-disk representation - I believe that this
happens for arrays, for instance. So I think it's practical for
pg_upgrade to change type names during a version upgrade, but not type
OIDs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#15)
Re: Floating-point timestamps versus Range Types

On Mon, 2010-10-18 at 14:49 -0400, Tom Lane wrote:

whereas an int-timestamp build sees these inputs as all the same.
Thus, to get into trouble you'd need to have a unique index on data that
conflicts at the microsecond scale but not at the tenth-of-a-microsecond
scale. This seems implausible. In particular, you didn't get any such
data from now(), which relies on Unix APIs that don't go below
microsecond precision. You might conceivably have entered such data
externally, as I did above, but you'd have to not notice/care that it
wasn't coming back out at the same precision.

You can also get there via interval math, like multiplying by a numeric.
That seems slightly more plausible.

So the argument seems academic to me ...

With UNIQUE indexes I agree completely. If nothing else, who puts a
UNIQUE index on high-precision timestamps? And the problem has existed
for a long time already, it's nothing new.

With Exclusion Constraints, it's slightly less academic, and it's a new
addition. Still pretty far-fetched; but at least plausible, which is why
I brought it up.

However, I won't argue with the "don't do anything" approach to
float-timestamps.

Regards,
Jeff Davis

#18Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#16)
Re: Floating-point timestamps versus Range Types

Robert Haas wrote:

On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:

A reasonable conversion path might be to offer integer timestamps using
a different type name (e.g. inttimestamp) that always means integer
timestamps. Then, they could convert using ALTER TABLE, then do an
in-place upgrade. We could even make pg_upgrade optionally convert
inttimestamp to timestamp in O(1) on an integer-timestamps build.

I think in retrospect it would certainly have been better to make
integer timestamps and float timestamps two separate data types,
rather than two versions of the same data type. Whether it's worth
providing that now after the fact is not clear to me. I'd be inclined
to wait and see whether we get many complaints...

One problem with changing types in pg_upgrade is that type OIDs can
get embedded in the on-disk representation - I believe that this
happens for arrays, for instance. So I think it's practical for
pg_upgrade to change type names during a version upgrade, but not type
OIDs.

One thing we have talked about is converting the page on read-in from
the backend. Since the timestamps are the same size as float or
integer, that might be possible.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#19Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#18)
Re: Floating-point timestamps versus Range Types

On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian <bruce@momjian.us> wrote:

One thing we have talked about is converting the page on read-in from
the backend.  Since the timestamps are the same size as float or
integer, that might be possible.

Did we have a solution for the problem that understanding which
columns are timestamps requires having a tuple descriptor and parsing
the every tuple? That seems like it would a) be slow and b) require a
lot of high level code in the middle of a low-level codepath.

--
greg

#20Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#19)
Re: Floating-point timestamps versus Range Types

Greg Stark wrote:

On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian <bruce@momjian.us> wrote:

One thing we have talked about is converting the page on read-in from
the backend. ?Since the timestamps are the same size as float or
integer, that might be possible.

Did we have a solution for the problem that understanding which
columns are timestamps requires having a tuple descriptor and parsing
the every tuple? That seems like it would a) be slow and b) require a
lot of high level code in the middle of a low-level codepath.

Yep, that's what it requires. It would rewrite in the new format.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#21)
#24James Cloos
cloos@jhcloos.com
In reply to: Jeff Davis (#1)
#25Jeff Davis
pgsql@j-davis.com
In reply to: James Cloos (#24)
#26James Cloos
cloos@jhcloos.com
In reply to: Jeff Davis (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: James Cloos (#26)
#28James Cloos
cloos@jhcloos.com
In reply to: Tom Lane (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: James Cloos (#28)