Proposal: temporal extension "period" data type

Started by Jeff Davisover 17 years ago10 messages
#1Jeff Davis
pgsql@j-davis.com

I maintain the project "Temporal PostgreSQL" on pgfoundry:

http://pgfoundry.org/projects/temporal

Reference docs available here:
http://temporal.projects.postgresql.org/reference.html

This project provides a data type called "period" (formerly t_interval).
This type is an interval in the mathematical sense, that is, it has a
beginning time and end time (timestamptz, to be specific). This is not
like the SQL type "interval" which is not anchored at any specific point
in time.

This is valuable for many applications that need to be able to
manipulate time. This functionality is sometimes called "bi-temporal",
because people sometimes represent it as two timestamps in two columns.

Representing the information in one column of "periods" has important
advantages such as:
* It's much simpler and less error-prone to write queries expressing
"overlaps", "contains", "intersection", etc.
* We can effectively index this type with GiST. There's no effective
way to index two separate timestamp columns.

During PGCon 2008, several people encouraged me to submit the code for
inclusion in the core.

The advantages of including it in core are that some features can't be
done from pgfoundry, such as:
* temporal foreign keys
* temporal joins
* syntax like "ALTER TABLE ... ADD LOG".
On the other hand, I don't currently have proposals for any of those
things. And including in core has the usual drawbacks, like waiting for
a new PostgreSQL release just to get some improvement for the data type.

1. Should it be included in core, or remain on pgfoundry?

2. If it should be included in core, I'd like to know if any changes
should be made to the API, available operators, or the names of anything
(see the reference docs). The current name of the type is "period" to
avoid confusion with SQL's misnamed "interval" type. The operators are
mostly self-explanatory, but I'm open to suggestion for better names for
those, too.

3. I'd like to get some help with details like the optimal GiST
picksplit function to use, and selectivity functions, and analyze
functions.

4. Should we replace the undocumented type "tinterval"?

Regards,
Jeff Davis

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Jeff Davis (#1)
Re: Proposal: temporal extension "period" data type

Jeff Davis wrote:

During PGCon 2008, several people encouraged me to submit the code for
inclusion in the core.

The advantages of including it in core are that some features can't be
done from pgfoundry, such as:
* temporal foreign keys
* temporal joins

Maybe I'm being dense, but how does inclusion in core help you do these
things? Or, more precisely, how does non-inclusion in core prevent them.

* syntax like "ALTER TABLE ... ADD LOG".

What does this mean?

cheers

andrew

#3Greg Smith
gsmith@gregsmith.com
In reply to: Jeff Davis (#1)
Re: Proposal: temporal extension "period" data type

On Mon, 26 May 2008, Jeff Davis wrote:

And including in core has the usual drawbacks, like waiting for a new
PostgreSQL release just to get some improvement for the data type.

How stable has the core code implementation been recently? If major
changes to it are unlikely anyway because it's matured, this downside is a
lot less important.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#4Jeff Davis
pgsql@j-davis.com
In reply to: Greg Smith (#3)
Re: Proposal: temporal extension "period" data type

On Mon, 2008-05-26 at 08:49 -0400, Greg Smith wrote:

On Mon, 26 May 2008, Jeff Davis wrote:

And including in core has the usual drawbacks, like waiting for a new
PostgreSQL release just to get some improvement for the data type.

How stable has the core code implementation been recently? If major
changes to it are unlikely anyway because it's matured, this downside is a
lot less important.

Aside from name changes, the code has been fairly static. However, I
expect that others will have suggestions about improving the picksplit
algorithm, selectivity functions, etc.

Regards,
Jeff Davis

#5Jeff Davis
pgsql@j-davis.com
In reply to: Andrew Dunstan (#2)
Re: Proposal: temporal extension "period" data type

On Mon, 2008-05-26 at 06:49 -0400, Andrew Dunstan wrote:

* temporal foreign keys
* temporal joins

Maybe I'm being dense, but how does inclusion in core help you do these
things? Or, more precisely, how does non-inclusion in core prevent them.

Temporal FK's can be implemented with triggers, but can't be
implemented with the current FK syntax and mechanism. Any extensions to
the FK syntax to support this would need to be done in postgres.

Temporal joins can be implemented by joining with the "overlaps"
operator and selecting the intersection of the two period columns.
Anything added would be syntactic sugar.

* syntax like "ALTER TABLE ... ADD LOG".

What does this mean?

The general idea is the same as the idea behind pgfoundry projects like:

http://pgfoundry.org/projects/tablelog/ and
http://pgfoundry.org/projects/aupg/

The idea is to record all deleted or updated tuples in a table so that
you can see a historical state of the table (i.e. a snapshot query) or
see some changes to the data over time.

Having some kind of syntax to back up a feature like that could be
useful for simplifying the creation of a log. Also, people may be more
comfortable using some built-in syntax if they are creating an audit log
for security purposes.

Regards,
Jeff Davis

#6Hannu Krosing
hannu@krosing.net
In reply to: Jeff Davis (#5)
Re: Proposal: temporal extension "period" data type

On Mon, 2008-05-26 at 10:59 -0700, Jeff Davis wrote:

On Mon, 2008-05-26 at 06:49 -0400, Andrew Dunstan wrote:

* temporal foreign keys
* temporal joins

Maybe I'm being dense, but how does inclusion in core help you do these
things? Or, more precisely, how does non-inclusion in core prevent them.

Temporal FK's can be implemented with triggers, but can't be
implemented with the current FK syntax and mechanism. Any extensions to
the FK syntax to support this would need to be done in postgres.

Temporal joins can be implemented by joining with the "overlaps"
operator and selecting the intersection of the two period columns.
Anything added would be syntactic sugar.

Either overlaps or contains seem valid join operators

Will there be UNIQUE (non-overlapping) indexes for period ?

--------------
Hannu

#7Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Jeff Davis (#1)
Re: Proposal: temporal extension "period" data type

Jeff Davis wrote:

The advantages of including it in core are that some features can't be
done from pgfoundry, such as:
* temporal foreign keys
* temporal joins
* syntax like "ALTER TABLE ... ADD LOG".
On the other hand, I don't currently have proposals for any of those
things. And including in core has the usual drawbacks, like waiting for
a new PostgreSQL release just to get some improvement for the data type.

1. Should it be included in core, or remain on pgfoundry?

I think it's a good candidate for inclusion in core. Obviously it's not
ready for that yet, but it does seem that a complete solution, with
joins etc., can't be achieved purely with user-defined operators.

2. If it should be included in core, I'd like to know if any changes
should be made to the API, available operators, or the names of anything
(see the reference docs). The current name of the type is "period" to
avoid confusion with SQL's misnamed "interval" type. The operators are
mostly self-explanatory, but I'm open to suggestion for better names for
those, too.

Instead of one datatype, I'd like to see the capability of defining
period types based on any datatype with suitable opclass. Periods of
dates and floats, for example, would be just as useful as periods of
timestamps.

4. Should we replace the undocumented type "tinterval"?

I didn't realize we have one. It seems that tinterval provides operators
like overlaps and contains as well. What's the difference between what
you're working on and tinterval?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Jeff Davis
pgsql@j-davis.com
In reply to: Heikki Linnakangas (#7)
Re: Proposal: temporal extension "period" data type

On Mon, 2008-05-26 at 19:11 +0100, Heikki Linnakangas wrote:

I think it's a good candidate for inclusion in core. Obviously it's not
ready for that yet, but it does seem that a complete solution, with
joins etc., can't be achieved purely with user-defined operators.

Joins can be acheived with user-defined operators by using overlaps and
intersection. However, it's ugly enough that not many people would
actually use them.

2. If it should be included in core, I'd like to know if any changes
should be made to the API, available operators, or the names of anything
(see the reference docs). The current name of the type is "period" to
avoid confusion with SQL's misnamed "interval" type. The operators are
mostly self-explanatory, but I'm open to suggestion for better names for
those, too.

Instead of one datatype, I'd like to see the capability of defining
period types based on any datatype with suitable opclass. Periods of
dates and floats, for example, would be just as useful as periods of
timestamps.

I agree, although we couldn't use the word "period" to describe floats
(at least, that's not how I understand the word).

Do we have any mechanism for that kind of a type generator? Typmod might
work, although that seems awkward.

4. Should we replace the undocumented type "tinterval"?

I didn't realize we have one. It seems that tinterval provides operators
like overlaps and contains as well. What's the difference between what
you're working on and tinterval?

"period" is based on timestamptz, "tinterval" is based on "abstime".
Also, my type can be inclusive or exclusive as needed, I couldn't find a
way to make "tinterval" exclusive on either side.

Regards,
Jeff Davis

#9Jeff Davis
pgsql@j-davis.com
In reply to: Hannu Krosing (#6)
Re: Proposal: temporal extension "period" data type

On Mon, 2008-05-26 at 21:10 +0300, Hannu Krosing wrote:

Either overlaps or contains seem valid join operators

"Overlaps" is the join operator as defined in [1]. In the case where you
have a temporal FK, they are the same. But yes, either is useful.

Will there be UNIQUE (non-overlapping) indexes for period ?

I discussed this with several people and I believe this is possible with
some modifications to GiST. I will be discussing it further and I will
try to put together a proposal.

Regards,
Jeff Davis

1. _Temporal Data and the Relational Model_, C.J. Date, Hugh Darwen,
Nikos A. Lorentzos, p. 149.

#10Dimitri Fontaine
dim@hi-media.com
In reply to: Jeff Davis (#8)
Re: Proposal: temporal extension "period" data type

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 26 mai 08 à 20:36, Jeff Davis a écrit :

Instead of one datatype, I'd like to see the capability of defining
period types based on any datatype with suitable opclass. Periods of
dates and floats, for example, would be just as useful as periods of
timestamps.

I agree, although we couldn't use the word "period" to describe floats
(at least, that's not how I understand the word).

In order to get a good enough implementation of text based prefix
searches (where prefix_column @> 'literal') I coded a prefix_range
datatype, which would be the text period here...
http://pgfoundry.org/projects/prefix

So I'd propose "range" as the generic "period" name, and let's extend
it to text too.
And while at it maybe we could index suffix searches in addition to
prefix searches...

Some tests show that even without specific index support, prefix @>
text is quicker than text like 'prefix%', so maybe it would be
possible to have some generic way to use ranges under the hood to
speed-up some specific queries...
Applied to timestamps and "period", this would allow using period
contains operator for handling between, e.g. Dunno if this part is a
good idea, but I nevertheless felt like raising it.

HTH,
- --
dim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkg7JNsACgkQZnqmTbDljfBGFgCeJ43Q4XQPPl/mEX+AuJmGeMs+
e0sAnR1eD5/B6J+8VADK4Vuj3USXdZuv
=WhNz
-----END PGP SIGNATURE-----