Ranges for well-ordered types

Started by Michael Glaesemannover 19 years ago19 messages
#1Michael Glaesemann
grzm@seespotcode.net

I've been interested in representing and manipulating time ranges in
PostgreSQL, where a time range is defined by a start time and an end
time. Time ranges are useful, for example, in representing when some
predicate was known valid. Similarly, time ranges can be used to
represent "transaction time": the version history of the data itself.
This is similar to the "time travel" feature in previous versions of
PostgreSQL. (Tables that include both valid time and transaction time
information are sometimes called bitemporal tables.) Ranges can also
be useful in scheduling applications.

The original use case that prompted me to explore this area was
tracking what time periods teachers were assigned to schools (a valid-
time table). Here's one way to represent this in PostgreSQL:

create table teachers__schools_1
(
teacher text not null
, school text not null
, from_date date not null
, to_date date not null
, check (from_date <= to_date)
, unique (teacher, school, from_date)
, unique (teacher, school, to_date)
);

Each row of this table represents the time range (from from_date to
to_date) during which a teacher was assigned to a particular school.
(Teachers can be assigned to more than one school at a time.) The
check constraint guarantees that [from_date, to_date] represents a
valid closed-closed interval (where the end points are included in
the range). Two unique constraints are necessary to guarantee
uniqueness for each row. In my use case, it would also be desirable
to apply constraints to prevent overlapping and ensure continuity--
that teachers are always at some school. This can be done using
constraint triggers, though making sure all of the comparisons for
four dates (beginning and end points for two ranges) are done
properly can be a little daunting and prone to bugs.

The problem of representing time ranges in a relational database has
been worked on for quite some time. Indeed, the PostgreSQL source
still includes a tinterval type, where the beginning and end points
are of type abstime, though this type is no longer included in the
documentation. Drafts of SQL3 included the PERIOD constructor to
represent date, time, and timestamp ranges as part of SQL/Temporal,
but this section of the specification was not included in the final
SQL:2003 standard. At least two relatively prominent books [1]Richard T. Snodgrass, Developing Time-Oriented Database Applications in SQL, Morgan Kaufmann, 2000. ISBN 1-55860-436-7. Out of print, but available from the author as a PDF. (http://www.cs.arizona.edu/people/rts/tdbbook.pdf)[2]Chris Date, Hugh Darwen, Nikolas Lorentzos, Temporal Data and the Relational Model, Morgan Kaufmann, 2002. ISBN 1-55860-855-9 (http:// www.amazon.com/gp/product/1558608559/) as
well as numerous papers have been written on the subject.

An aside regarding terminology: In the literature I've read, time
ranges are most often referred to as intervals. However, SQL already
uses INTERVAL to refer to another distinct type. As mentioned above,
SQL3 used the term PERIOD for a constructor to create types with a
beginning point and an end point. RANGE is a reserved keyword in SQL:
2003 (related, I believe, to windowed tables). Range also has a
distinct meaning in relational calculus. I'm at a bit of a loss as to
how to refer to these structures with a beginning and an end point
with a term that isn't already reserved in SQL or may be in the
future. Suggestions welcome :) Span? Reuse tinterval? For the time
being, I'll arbitrarily continue to use range.

In the first six months of 2006 alone, I've noted quite a few threads
related to time ranges in the various PostgreSQL mailing lists, so it
seems this issue is ripe for a solution. Just two weeks ago, Albert
Hertroys started work on a vector type[3]Vector type (Re: [GENERAL] challenging constraint situation - how do I make it) (http://archives.postgresql.org/pgsql-general/2006-05/ msg01279.php) , where he rightly notes
that time ranges are just an application of a general range (or
vector, to use his term) that could just as easily be used with
"integers, reals, points, dates, etc". For the past couple of months,
I've been working with composite types and PL/pgSQL to define and
manipulate date ranges and integer ranges, and see what can be
abstracted out to a general range type.

In the general case, a particular range value can be represented as
two point values. For example, the date range [2006-01-01,
2006-05-31] (using the closed-closed interval representation) is
represented by the two date "point" values 2006-01-01 and 2006-05-31.
The interval range [3,9] is represented by the two integer "point"
values 3 and 9. A range can be formed for any point type, where a
point type is any type that's well-ordered:
* the range of values is bounded (the number of values in the type
is finite)
* comparisons are well-defined for any two values, and
* for any point p, the next point can be found using a successor
function

Given a well-ordered "point" type, common questions of ranges can be
be answered, such as, for two ranges r1 and r2
* Do r1 and r2 overlap?
* Does r1 meet r2?
* Is the union of r1 and r2 defined, and if so, what is it?
* Is the intersection of r1 and r2 defined, and if so, what is it?

The way I've thought of implementing ranges in PostgreSQL is through
an additional system catalog table (called pg_ordered_type for
convenience) that would store the necessary information for each
point type:
* ordtype: the point type (referencing pg_type.oid)
* ordcmpfunc: the comparison function (referencing pg_proc.oid)
* ordsucc: the successor function (referencing pg_proc.oid)
* ordmin, ordmax the minimum and maximum values for the point type
(not sure how this would be represented)

From one point of view, the successor function and minimum and
maximum values (and perhaps the comparison function) are intrinsic
attributes of the type itself. However, I can see potential
usefulness using the same "base" type with different successor
functions or different maximum and minimum values. For example, to
represent a point type of even numbers, the base type could be
integer, and the successor function would "p + 2". (This raises the
question of how to restrict the values of the type to even numbers:
perhaps using domains would be a solution.) Another example would be
using timestamps of different precision: timestamp(6) has a different
successor function than timestamp(0).

It may be desirable to include in pg_ordered_type a "predecessor"
function (returning the point prior to a point p) and a duration
function (providing a more efficient way to return the number of
points included in the interval than iterating over the successor
function).

With comparison and the successor function, one can define the 13
predicates (often called Allen operators) describing the relationship
between any two range values: equals, before, meets, overlaps,
starts, during, finishes, and inverses of the last six.

Note that two of the built-in numeric types--real and double-
precision--are not well-ordered, and therefore ranges for real and
double-precision would not be defined. The primary reason for this is
that without a successor function, it's not possible to determine
whether or not two range values meet (determined by the meets Allen
operator). And without a definition for meets, it's not possible to
define continuity--to use the above example, that a teacher must
always be assigned to some school. In that respect, the
implementation I'm proposing here differs from the vector type
proposed by Albert Hertroys. Perhaps there's a way to have a "looser"
form of ranges that do not include a successor function, and for
those looser ranges, continuity constraints couldn't be defined.
However, the issues arising from the inexactness of real and double
precision calls into question the usefulness of functions that rely
on testing for equality (e.g., equals, meets, starts, finishes, and
inverses of the last three).

Another issue for a general range implementation is uniqueness.
Currently btree indexes are used to enforce uniqueness in PostgreSQL.
By somewhat arbitrarily defining a comparison result for the 13
possible predicates, I've been able to define an opclass for ranges
so btree indexes can be defined. Note that in defining an opclass,
comparision is already required, so perhaps the comparison function
could be stored someplace other than the pg_ordered_type table.

As for other indexes, I've looked at some of the GiST material and
think that it probably can be usefully applied to ranges: for example
the ip4r type[4]ip4r (http://pgfoundry.org/projects/ip4r/), which defines ranges of IPv4 addresses, uses GiST
for indexing. However, GiST is currently well over my head, so this
is speculation on my part.

Returning to my original example, with a "date_range" type, the table
could be defined as:

create table teachers__schools_2
(
teacher text not null
, school text not null
, period date_range not null
, primary key (teacher, school, period)
);

The original check constraint is handled by the date_range type and
the two unique constraints are replaced by a single primary key
constraint. Constraints for overlapping and continuity are still
handled using constraint triggers, but are easier to implement using
functions available to compare ranges rather than handling beginning
and end points individually.

I believe ranges in PostgreSQL would be useful and satisfy a need
expressed in numerous mailing list threads. I hope to do my part in
their implementation. I look forward to your feedback.

Michael Glaesemann
grzm seespotcode net

[1]: Richard T. Snodgrass, Developing Time-Oriented Database Applications in SQL, Morgan Kaufmann, 2000. ISBN 1-55860-436-7. Out of print, but available from the author as a PDF. (http://www.cs.arizona.edu/people/rts/tdbbook.pdf)
Applications in SQL, Morgan Kaufmann, 2000. ISBN 1-55860-436-7. Out
of print, but available from the author as a PDF.
(http://www.cs.arizona.edu/people/rts/tdbbook.pdf)
[2]: Chris Date, Hugh Darwen, Nikolas Lorentzos, Temporal Data and the Relational Model, Morgan Kaufmann, 2002. ISBN 1-55860-855-9 (http:// www.amazon.com/gp/product/1558608559/)
Relational Model, Morgan Kaufmann, 2002. ISBN 1-55860-855-9 (http://
www.amazon.com/gp/product/1558608559/)
[3]: Vector type (Re: [GENERAL] challenging constraint situation - how do I make it) (http://archives.postgresql.org/pgsql-general/2006-05/ msg01279.php)
do I make it) (http://archives.postgresql.org/pgsql-general/2006-05/
msg01279.php)
[4]: ip4r (http://pgfoundry.org/projects/ip4r/)

#2Ian Caulfield
ian.caulfield@gmail.com
In reply to: Michael Glaesemann (#1)
Re: Ranges for well-ordered types

On 6/10/06, Michael Glaesemann <grzm@seespotcode.net> wrote:

Returning to my original example, with a "date_range" type, the table
could be defined as:

create table teachers__schools_2
(
teacher text not null
, school text not null
, period date_range not null
, primary key (teacher, school, period)
);

The original check constraint is handled by the date_range type and
the two unique constraints are replaced by a single primary key
constraint. Constraints for overlapping and continuity are still
handled using constraint triggers, but are easier to implement using
functions available to compare ranges rather than handling beginning
and end points individually.

I've done similar date range things by creating a composite type consisting
of the lower and upper bounds, and then implementing a btree opclass where
the comparator returns 0 if two ranges overlap - this allows a current btree
index to enforce non-overlapping ranges, and allows indexed lookup of which
range contains a particular value.

Not sure whether this covers your scenario, but it works fairly well for me
:)

Ian

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ian Caulfield (#2)
Re: Ranges for well-ordered types

"Ian Caulfield" <ian.caulfield@gmail.com> writes:

I've done similar date range things by creating a composite type consisting
of the lower and upper bounds, and then implementing a btree opclass where
the comparator returns 0 if two ranges overlap - this allows a current btree
index to enforce non-overlapping ranges, and allows indexed lookup of which
range contains a particular value.

And how hard did you test this? Non-transitive "equality" is certain to
confuse btree, leading to wrong answers.

regards, tom lane

#4Joshua D. Drake
jd@commandprompt.com
In reply to: Ian Caulfield (#2)
Re: Ranges for well-ordered types

Ian Caulfield wrote:

On 6/10/06, *Michael Glaesemann* <grzm@seespotcode.net
<mailto:grzm@seespotcode.net>> wrote:

Returning to my original example, with a "date_range" type, the table
could be defined as:

create table teachers__schools_2
(
teacher text not null
, school text not null
, period date_range not null
, primary key (teacher, school, period)
);

The original check constraint is handled by the date_range type and
the two unique constraints are replaced by a single primary key
constraint. Constraints for overlapping and continuity are still
handled using constraint triggers, but are easier to implement using
functions available to compare ranges rather than handling beginning
and end points individually.

I've done similar date range things by creating a composite type
consisting of the lower and upper bounds, and then implementing a
btree opclass where the comparator returns 0 if two ranges overlap -
this allows a current btree index to enforce non-overlapping ranges,
and allows indexed lookup of which range contains a particular value.

Not sure whether this covers your scenario, but it works fairly well
for me :)

Why not define a start_date and end_date to determine range, and then
use the date overlap functions in postgresql?

Joshua D Drake

Show quoted text

Ian

#5Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#1)
Re: Ranges for well-ordered types

On Jun 10, 2006, at 23:51 , Michael Glaesemann wrote:

A range can be formed for any point type, where a point type is
any type that's well-ordered:
* the range of values is bounded (the number of values in the type
is finite)
* comparisons are well-defined for any two values, and
* for any point p, the next point can be found using a successor
function

It was pointed out to me off list that I got my definition of well-
ordered wrong. I was confusing the definition of well-ordered with
the overall requirements that I was using to define ranges.

Well-ordered is just that for any two values a and b of a given type,
a < b is defined. That's what I was attempting to get at in the
second point above. The added requirements of having the type bounded
(which is going to happen on a computer anyway) and having a
successor function are still required for the range definition, but
not part of the definition of well-orderedness per se.

Michael Glaesemann
grzm seespotcode net

#6Michael Glaesemann
grzm@seespotcode.net
In reply to: Ian Caulfield (#2)
Re: Ranges for well-ordered types

On Jun 11, 2006, at 0:54 , Ian Caulfield wrote:

I've done similar date range things by creating a composite type
consisting of the lower and upper bounds, and then implementing a
btree opclass where the comparator returns 0 if two ranges overlap
- this allows a current btree index to enforce non-overlapping
ranges, and allows indexed lookup of which range contains a
particular value.

As Tom already pointed out, this method leads to problems with btree
indexes. I haven't heavily tested my own implementation (below), but
it only returns 0 for equality, which is what btree expects. All
other possible relationships between two ranges have a well-defined
result of -1 or 1. I believe this should be enough to prevent any
transitivity issues with btree.

Michael Glaesemann
grzm seespotcode net

create type interval_date as
(
_1 point_date
, _2 point_date
);
comment on type interval_date is
'The internal representation of date intervals, representing the
closed-closed '
'interval [_1,_2]';

create function interval_cmp(
interval_date -- $1 i1
, interval_date -- $2 i2
) returns integer
strict
immutable
security definer
language plpgsql as '
declare
i1 alias for $1;
i2 alias for $2;
cmp integer;
begin
perform check_intervals(i1,i2);

cmp := 1;

if i1._1 = i2._1
and i1._2 = i2._2
then cmp := 0;
else
if (i1._2 < i2._2)
or (i1._2 = i2._2
and i1._1 > i2._1)
then cmp = -1;
end if;
end if;

return cmp;
end;
';

#7Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#6)
Re: Ranges for well-ordered types

On Jun 11, 2006, at 2:34 , Michael Glaesemann wrote:

On Jun 11, 2006, at 0:54 , Ian Caulfield wrote:

I've done similar date range things by creating a composite type
consisting of the lower and upper bounds, and then implementing a
btree opclass where the comparator returns 0 if two ranges overlap
- this allows a current btree index to enforce non-overlapping
ranges, and allows indexed lookup of which range contains a
particular value.

As Tom already pointed out, this method leads to problems with
btree indexes. I haven't heavily tested my own implementation
(below), but it only returns 0 for equality, which is what btree
expects. All other possible relationships between two ranges have a
well-defined result of -1 or 1. I believe this should be enough to
prevent any transitivity issues with btree.

Of course, this method doesn't provide the non-overlapping
constraint. That still needs to be handled by a constraint trigger.

Michael Glaesemann
grzm seespotcode net

#8Bruno Wolff III
bruno@wolff.to
In reply to: Michael Glaesemann (#1)
Re: Ranges for well-ordered types

On Sat, Jun 10, 2006 at 23:51:58 +0900,
Michael Glaesemann <grzm@seespotcode.net> wrote:

Each row of this table represents the time range (from from_date to
to_date) during which a teacher was assigned to a particular school.
(Teachers can be assigned to more than one school at a time.) The
check constraint guarantees that [from_date, to_date] represents a
valid closed-closed interval (where the end points are included in
the range). Two unique constraints are necessary to guarantee

I think you might want to reconsider your design. It works well for dates
because sets of dates are made of of isolated points and such sets are
both open and closed. If you are using time, I think it will be more convenient
to use a closed, open representation.

#9Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruno Wolff III (#8)
Re: Ranges for well-ordered types

On Jun 11, 2006, at 5:15 , Bruno Wolff III wrote:

I think you might want to reconsider your design. It works well for
dates
because sets of dates are made of of isolated points and such sets are
both open and closed. If you are using time, I think it will be
more convenient
to use a closed, open representation.

Under design I proposed, closed-closed and closed-open are just two
different representations of the same range: to the commonly used
notation, the closed-open range [p1, p2) is equivalent to the closed-
closed range [p1, next(p2)], where next() is the successor function.
I agree than depending on the context, it may be better to use one
representation than the other (a budget meeting that lasts from 10:00
until 11:00 meets but doesn't share any points with an early lunch
meeting that starts at 11:00). Perhaps there should be probably some
to_char functions to format the range in the desired form.

Time (and timestamp) is a bit of a issue conceptually. The "default"
successor function would depend on the precision of the timestamp.
timestamp(0) would have a successor function of + 1 second, while
timestamp(3) would have a successor function of + .001 second. In the
above example, Monday's budget meeting in Tokyo from 10:00 until
11:00 could be represented with ranges of timestamp(0) with time zone as
[2006-06-12 10:00:00+09, 2006-06-12 11:00:00+09)
or as
[2006-06-12 10:00:00+09, 2006-06-12 10:59:59+09]

With timestamp(3) with time zone, that'd be
[2006-06-12 10:00:00.000+09, 2006-06-12 11:00:00.000+09)
or as
[2006-06-12 10:00:00.000+09, 2006-06-12 10:59:59.999+09]

Most people would be more comfortable with the first representation
of each pair, but the two representations in each pair represent the
same range.

For a lot of scheduling applications, using timestamps with a
precision greater that 0 probably wouldn't be very useful (and when
not using integer datetimes, not all that exact). Indeed, some
scheduling applications may want a precision of 1 minute, rather than
1 second, or perhaps a precision of 15 minutes, or even an hour. I
see this as a limitation of the timestamp type, and perhaps a
workaround could be found using check constraints and more
sophisticated successor functions.

For example, a first cut of a successor function for a timestamp with
precision of 1 hour might use + 3600 seconds, but the difference in
seconds between the top of any two hours may not necessarily be 3600
seconds in some corner cases when the calendar has changed. In those
cases, the successor function would need to be sure to return the
next hour, rather than the previous hour + 3600 seconds. (Perhaps the
calendar has never made a change where this would be a problem, but
for some arbitrary timestamp precision, for example 1 day, this could
be true. I haven't done enough research yet to determine how much of
a problem this is. In those cases it might be better to use dates
than timestamps.)

With time zones and daylight saving time, this becomes even more
interesting, especially for time zone offsets that aren't integral
hours (e.g., South Australia Standard Time +9:30, Iran Time +3:30,
India Time +5:30). A 1 hour precision requirement would need to
include the applicable time zone. There's been previous discussion of
including such time zone information in the timestamp value, but as
far as I know, no work has been done in that direction yet.

These are interesting questions, and improvements in timestamp can
make ranges even more convenient. I still see utility in ranges using
the current timestamp implementation as well.

Michael Glaesemann
grzm seespotcode net

#10Jim C. Nasby
jnasby@pervasive.com
In reply to: Michael Glaesemann (#9)
Re: Ranges for well-ordered types

On Sun, Jun 11, 2006 at 10:18:11AM +0900, Michael Glaesemann wrote:

On Jun 11, 2006, at 5:15 , Bruno Wolff III wrote:

I think you might want to reconsider your design. It works well for
dates
because sets of dates are made of of isolated points and such sets are
both open and closed. If you are using time, I think it will be
more convenient
to use a closed, open representation.

Under design I proposed, closed-closed and closed-open are just two
different representations of the same range: to the commonly used
notation, the closed-open range [p1, p2) is equivalent to the closed-
closed range [p1, next(p2)], where next() is the successor function.

Why try messing aronud with a successor function when you can just use <
instead of <= ?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#11Bruno Wolff III
bruno@wolff.to
In reply to: Michael Glaesemann (#9)
Re: Ranges for well-ordered types

On Sun, Jun 11, 2006 at 10:18:11 +0900,
Michael Glaesemann <grzm@seespotcode.net> wrote:

Time (and timestamp) is a bit of a issue conceptually. The "default"
successor function would depend on the precision of the timestamp.

And in the ideal case it doesn't exist. That is why I think a closed, open
interval is a better way to go.

#12Michael Glaesemann
grzm@seespotcode.net
In reply to: Jim C. Nasby (#10)
Re: Ranges for well-ordered types

On Jun 11, 2006, at 10:57 , Jim C. Nasby wrote:

On Sun, Jun 11, 2006 at 10:18:11AM +0900, Michael Glaesemann wrote:

Under design I proposed, closed-closed and closed-open are just two
different representations of the same range: to the commonly used
notation, the closed-open range [p1, p2) is equivalent to the
closed-
closed range [p1, next(p2)], where next() is the successor function.

Why try messing aronud with a successor function when you can just
use <
instead of <= ?

If I understand you correctly, you're pointing out that the
constraints for a valid range in closed-closed and closed-open
representation are different:

for a valid closed-open range r1 [a1, b1), a1 < b1
for a valid closed-closed range r2 [a2, b2], a2 <= b2

That's different from being able to show equivalence between two
ranges in different representations, e.g., r1 = r2 iff a1 = a2 and b1
= next(b2). As Bruno pointed out earlier, in some cases, a closed-
open representation is desirable, and I think that in others, such as
date ranges, a closed-closed representation is useful. Another place
where I'd use a closed-closed representation would be for describing
score ranges for grades (e.g., 70-79 is a C, 80-89 is a B). I'm not
sure how to go about converting between these two representations
without using a successor (or predecessor) function.

Another way of looking at not having a successor function is not
restricting ranges to be comprised of discrete point types. In that
case, you can't really use closed-closed representation at all (or
open-open, for that matter), because the successor function is
necessary to define the meets and before predicates, or to convert
between closed-closed and closed-open representations, if one is
using closed-open representation internally.

Another wrinkle in cases without a defined successor function is
boundary cases: ranges that include one or both of the bounds of the
point type. For example, with a closed-open representation, a range
can't include the upper bound of the point type. Perhaps a way around
this would be to allow infinity as a possible value: the date range
['2006-06-11', 'infinity') would describe a range from June 11, 2006
through the upper bound of the date type (5874897-12-31 on my
machine, though interestingly enough:

postgres=# SELECT '5874897-12-31'::date;
date
---------------
5874897-12-31
(1 row)

postgres=# SELECT '5874897-12-31'::date + 2;
?column?
---------------
5874898-01-02
(1 row)

postgres=# SELECT '5874898-01-02'::date;
ERROR: date out of range: "5874898-01-02"
postgres=# SELECT ('5874897-12-31'::date + 2)::date;
date
---------------
5874898-01-02
(1 row)

postgres=# SELECT '5874897-12-31'::date + interval '2 days';
?column?
----------------------------
29357-07-06 15:41:44.48384
(1 row)

postgres=# SELECT ('5874897-12-31'::date + interval '2 days')::date;
date
-------------
29357-07-06
(1 row)

postgres=# select version();

version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
build 5341)
(1 row)

That looks a bit odd. :(

Also, a successor function is very useful in testing other
predicates. To keep things simpler, I'm going to use the same
representation for both ranges, as internally you'd probably convert
the ranges to some canonical representation for comparison. Whether
that canonical representation is closed-closed, closed-open, or a
point and an interval doesn't really matter.

Practically, I think being able to use both closed-closed and closed-
open representations as needed is useful, and for most purposes, the
types can be considered discrete and bounded. Is there a lot to be
gained by not defining a successor function?

Michael Glaesemann
grzm seespotcode net

#13Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruno Wolff III (#11)
Re: Ranges for well-ordered types

On Jun 11, 2006, at 14:45 , Bruno Wolff III wrote:

On Sun, Jun 11, 2006 at 10:18:11 +0900,
Michael Glaesemann <grzm@seespotcode.net> wrote:

Time (and timestamp) is a bit of a issue conceptually. The "default"
successor function would depend on the precision of the timestamp.

And in the ideal case it doesn't exist. That is why I think a
closed, open
interval is a better way to go.

How would you go about converting a closed-open representation to a
closed-closed representation for display purposes? Or inserting data
that is provided in closed-closed representation?

Michael Glaesemann
grzm seespotcode net

#14Jim C. Nasby
jnasby@pervasive.com
In reply to: Michael Glaesemann (#13)
Re: Ranges for well-ordered types

On Sun, Jun 11, 2006 at 03:22:55PM +0900, Michael Glaesemann wrote:

On Jun 11, 2006, at 14:45 , Bruno Wolff III wrote:

On Sun, Jun 11, 2006 at 10:18:11 +0900,
Michael Glaesemann <grzm@seespotcode.net> wrote:

Time (and timestamp) is a bit of a issue conceptually. The "default"
successor function would depend on the precision of the timestamp.

And in the ideal case it doesn't exist. That is why I think a
closed, open
interval is a better way to go.

How would you go about converting a closed-open representation to a
closed-closed representation for display purposes? Or inserting data
that is provided in closed-closed representation?

Perhaps it makes more sense to consider the four possibilities
{ ( ), ( ], [ ), [ ] }
as different data types.

I realize that mathematically, there's probably plenty of reasons to
convert between closed and open on both ends (though I admit I can't
think of any reason to do this), but my focus is on what I believe to be
(by far and away) the most common PostgreSQL use case: defining
timestamp ranges. And that's something that needs to be closed, open.
(Sadly, I see people mess that up frequently.)

If this case can be well satisfied by an interval type that depends on a
sucessor function without a bunch of complexities (in code or
operation), then that's great. I'm worried that that's not the case (the
issue of various timestamp precisions is worrying enough by itself), and
I'd much rather see a solid closed, open interval added than nothing at
all.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#15Josh Berkus
josh@agliodbs.com
In reply to: Michael Glaesemann (#1)
Re: Ranges for well-ordered types

Michael,

First off, mark me down as one of the people interested in having this ...
I've been hacking a lot of the same functionality using data-push functions
and triggers, and boy howdy, it's messy.

I do think Jim is right, though, in that we may want to look for portions of
the functionality which are achievable in the context of one PostgreSQL
version, unless you're going to be working full-time on this patch. Or
maybe queue it up for next summer's Summer of Code. Not sure what that
portion would be, though.

In real-world calendaring applications, I *certainly* see the need for a
successor function. However, that would require being able to define
timestamps with a variable precision, e.g. TIMESTAMP('5 minutes'). This, by
itself would be a significant effort, yet useful ... maybe that's where to
start?

You're probably going to have to give up on B-Tree indexes for PERIODs, and
look towards GiST. For one thing, I would see UNIQUE in the context of a
PERIOD defined as non-overlapping. e.g.:

Given
UNIQUE (name, period)
Name Period
Joe 2006-06-11 14:00:00->17:35:00
Marsha 2006-06-11 15:00:00->19:00:00
Ralph 2006-06-11 15:15:00->15:30:00

I would want (in a calendaring application) an attempt to insert:
Joe 2006-06-11 17:00:00->19:00:00
... to fail, as well as any attempt to:
UPDATE periods SET name = 'Marsha' WHERE name = 'Ralph';

... although it's possible that UNIQUE is not really the right name for that
kind of constraint, although it serves the same purpose. But I don't think
in this context that a B-Tree would be capable of fulfilling it.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#16Martijn van Oosterhout
kleptog@svana.org
In reply to: Josh Berkus (#15)
Re: Ranges for well-ordered types

On Sun, Jun 11, 2006 at 11:22:41AM -0700, Josh Berkus wrote:

You're probably going to have to give up on B-Tree indexes for PERIODs, and
look towards GiST. For one thing, I would see UNIQUE in the context of a
PERIOD defined as non-overlapping. e.g.:

If GiST could define UNIQUE indexes, this probably would've been done
already. Fix that, and the index itself will appear shortly
afterwards...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#17Bruno Wolff III
bruno@wolff.to
In reply to: Michael Glaesemann (#12)
Re: Ranges for well-ordered types

On Sun, Jun 11, 2006 at 15:13:39 +0900,
Michael Glaesemann <grzm@seespotcode.net> wrote:

That's different from being able to show equivalence between two
ranges in different representations, e.g., r1 = r2 iff a1 = a2 and b1
= next(b2). As Bruno pointed out earlier, in some cases, a closed-
open representation is desirable, and I think that in others, such as
date ranges, a closed-closed representation is useful. Another place
where I'd use a closed-closed representation would be for describing
score ranges for grades (e.g., 70-79 is a C, 80-89 is a B). I'm not
sure how to go about converting between these two representations
without using a successor (or predecessor) function.

Date ranges are really closed open as well (as finite sets of isolated points
are both open and closed). The only oddity would be that the date used to
indicate the open end of the range might not be what the user expects.

#18Michael Glaesemann
grzm@seespotcode.net
In reply to: Josh Berkus (#15)
Re: Ranges for well-ordered types

Thanks to everyone for the feedback that I've received so far. It's
clear that there's interest in this.

On Jun 12, 2006, at 3:22 , Josh Berkus wrote:

I do think Jim is right, though, in that we may want to look for
portions of
the functionality which are achievable in the context of one
PostgreSQL
version, unless you're going to be working full-time on this patch.

I definitely agree with implementing it in parts. I doubt it's
possible, but perhaps a first bit might make it into 8.2 :)

In real-world calendaring applications, I *certainly* see the need
for a
successor function. However, that would require being able to define
timestamps with a variable precision, e.g. TIMESTAMP('5 minutes').
This, by
itself would be a significant effort, yet useful ... maybe that's
where to
start?

As mentioned in an earlier email, I think calendaring applications in
particular would benefit from timestamp precisions of less than 1
second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However,
I think this is a thorny problem. To elaborate, I believe the
precision has to be relative to some "baseline". From 12:00, 30
minute precision would presumably allow 12:00, 12:30, 13:00, 13:30,
and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and
so on. But these are relative to the time zone they're in. While
12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision,
that same timestamp is 4:30 in Tehran (+3:30) if I got the math
right. Is 4:30 a timestamp value with 1 hour precision? Because of
this, I think timestamp precisions of less than 1 second (timestamp
(0)) require storing the time zone as part of the timestamp value.

Pushing this even further, would we allow arbitrary precision? For
example, would 45-minute precision be allowed? In that case, I
believe we'd need to go further than storing just the time zone with
the timestamp value. The timestamp value would have to be relative to
some baseline timestamp to be able to calculate whether or not the
difference between any particular timestamp and the baseline
timestamp is integral. Perhaps this could be accomplished using
domains and some additional checking function? I'm not sure. It's
enough to make me want to forget about the idea of disallowing any
precision that is not an evenly divided into the next larger "time
part": any precision between 0 seconds and 1 minute would have to be
a number of seconds evenly divided into 60; between 1 hour and 1 day,
precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12
hours.

I've been able to discuss the issue of timestamp precision without
bringing up successor functions or ranges at all, and indeed I think
it's orthogonal to the range implementation. I think they're both
concepts that should be included in PostgreSQL, but as for myself,
I'm more interested in the range implementation than the the
timestamp precision issue.

By the way, anyone care to weigh in on what term we should use when
discussing this? Josh has used PERIOD. Should we go with that for now?

A somewhat related issue: would we want any implementation to follow
(at least part) of the not-yet-standard SQL/Temporal draft? Or would
it be more desirable to steer clear of using any terms/syntax that
was included in an attempt to prevent any possible conflict with a
future SQL spec?

You're probably going to have to give up on B-Tree indexes for
PERIODs, and
look towards GiST. For one thing, I would see UNIQUE in the
context of a
PERIOD defined as non-overlapping. e.g.:

I think that a non-overlapping constraint goes above and beyond what
UNIQUE requires. In my opinion, UNIQUE should test for equality,
rather than non-overlapping, as that keeps the meaning of UNIQUE
consistent across all types and may actually be useful in some
instances. I do think it would be convenient to have some sort of
syntax that would provide a non-overlapping constraint rather than
having to code up a constraint trigger every time you wanted to do
this. As Martijn pointed out, when GiST can be used for a UNIQUE
constraint, we should be able to define the non-overlapping
constraint quite easily. So this could be thought of as a third
orthogonal issue for ranges, the first two being the range type
constructor and timestamp precision < 1 second. Any one of these
three could be done independently and improve PostgreSQL. In
combination they are definitely a very nice package.

On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:

Date ranges are really closed open as well (as finite sets of
isolated points
are both open and closed). The only oddity would be that the date
used to
indicate the open end of the range might not be what the user expects.

I think it's definitely a matter of interpretation. [2006-01-01,
2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.
Who's to say which is the "real" representation? For all practical
purposes (i.e., what can be represented within the database)
[2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01
00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with
time zone ranges as well. While one might idealize time to be
continuous, as far as I know there isn't a way to represent time that
way in a computer, at the very least, not in PostgreSQL.

And for the very reason that it might not be what the user expects,
if there's a way to convert between closed-open and closed-closed as
appropriate, I think it makes it much more use friendly to do so. For
example, the closed-closed representation is equivalent to what
BETWEEN does. It would be very nice to be able to provide sometime
equivalent with ranges.

As for the successor function itself: Any "exact" datatype, such as
timestamp (at least with --enable-integer-datetimes), date, integer,
or numeric, has some built-in precision anyway and a successor
function follows quite directly from that precision. I don't see that
as problematic or even very difficult.

Thanks again for your comments, past, present and future! It's been
very helpful for me to hear from others on this.

Michael Glaesemann
grzm seespotcode net

#19Bruno Wolff III
bruno@wolff.to
In reply to: Michael Glaesemann (#18)
Re: Ranges for well-ordered types

On Wed, Jun 14, 2006 at 15:47:16 +0900,
Michael Glaesemann <grzm@seespotcode.net> wrote:

On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:

Date ranges are really closed open as well (as finite sets of
isolated points
are both open and closed). The only oddity would be that the date
used to
indicate the open end of the range might not be what the user expects.

I think it's definitely a matter of interpretation. [2006-01-01,
2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.
Who's to say which is the "real" representation? For all practical

They are both real. In part my point was the reason the closed, closed form
works well for overlap checking is because the sets are also closed, open
and behave like that as well. (Though the user visible names are different.)

purposes (i.e., what can be represented within the database)
[2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01
00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with
time zone ranges as well. While one might idealize time to be
continuous, as far as I know there isn't a way to represent time that
way in a computer, at the very least, not in PostgreSQL.

Which is a good reason to used the Closed, Open definition. Then you don't
have to work about whether postgres has been built with integer timestamps
or the details of the floating point hardware your database is running on.

And for the very reason that it might not be what the user expects,
if there's a way to convert between closed-open and closed-closed as
appropriate, I think it makes it much more use friendly to do so. For
example, the closed-closed representation is equivalent to what
BETWEEN does. It would be very nice to be able to provide sometime
equivalent with ranges.

I don't think it is unreasonable to have a different external representation
for date ranges and timestamp ranges. It isn't conistant, but I think it is
more readily understandable. Internally, it will probably be better to use
closed, open for both to keep the code consistant to allow for reuse and
better understandibility at that level.

As for the successor function itself: Any "exact" datatype, such as
timestamp (at least with --enable-integer-datetimes), date, integer,
or numeric, has some built-in precision anyway and a successor
function follows quite directly from that precision. I don't see that
as problematic or even very difficult.

The successor function for timestamp when not using integer datetimes is
going to depend on the underlying hardware. I think that will be a problem
unless you are planning to force the use of integer datetimes. I don't
think the project is ready for that yet, though in the long run I think it
is a better way to go.