Range types
I had proposed a temporal contrib module earlier and you wanted to see
support for many range types not just timestamptz. So I had an idea on
how to implement this but I want to see if you guys thought it was a
viable idea.
So basically I have an anyrange pseudo type with the functions prev,
next, last, etc defined. So instead of hard coding range types, we would
allow the user to define their own range types. Basically if we are able
to determine the previous and next values of the base types we'd be able
to define a range type. I'm envisioning in a manner much like defining
an enum type.
CREATE TYPE periodtz AS RANGE (timestamptz, '1 microsecond'::interval);
CREATE TYPE numrange AS RANGE (numeric(8,2));
-- determine granularity from typmod
CREATE TYPE floatrange AS RANGE (float, '0.000000001'::float);
Or getting really crazy...
CREATE TYPE terms AS ENUM ('2000_F', '2000_W', '2000_S', '2000_Su'...
'2010_F', '2010_W', '2010_S', '2010_Su');
CREATE TYPE termrange AS RANGE (terms);
So basically I have a pg_range table to store the base typeid, a text
field for the granule value and the granule typeid.
I doubt we would be able to get this in for the 8.5 release, especially
since I'm still learning C and the Postgres internals. Jeff Davis is
going to get something in before the next commit fest so we'll have some
type of temporal/range support. But we wanted to see what direction the
community felt we should go.
Scott Bailey
Scott Bailey <artacus@comcast.net> wrote:
CREATE TYPE periodtz AS RANGE (timestamptz, '1 microsecond'::interval);
What does the second argument mean? Is it a default interval?
So basically I have a pg_range table to store the base typeid, a text
field for the granule value and the granule typeid.
As another approach, what about storing typeid in typmod?
(Oid can be assumed to be stored in int32.)
For example,
CREATE TABLE tbl ( r range(timestamp) );
SELECT '[ 2.0, 3.0 )'::range(float);
There might be some overhead to store typeid for each range instance,
but the typmod approach does not require additinal catalogs and syntax
changes. It can be possible even on 8.4.
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sun, Dec 13, 2009 at 11:49:53PM -0800, Scott Bailey wrote:
I had proposed a temporal contrib module earlier and you wanted to see
support for many range types not just timestamptz [...]
So basically I have an anyrange pseudo type with the functions prev, next,
last, etc defined. So instead of hard coding range types, we would allow
the user to define their own range types. Basically if we are able to
determine the previous and next values of the base types we'd be able to
define a range type. I'm envisioning in a manner much like defining an enum
type.CREATE TYPE periodtz AS RANGE (timestamptz, '1 microsecond'::interval);
CREATE TYPE numrange AS RANGE (numeric(8,2));
-- determine granularity from typmod
CREATE TYPE floatrange AS RANGE (float, '0.000000001'::float);
I might be asking the same as Itagaki is (see below) but... are you just
envisioning ranges on 'discrete' types? What about ranges on floats or
(arbitrary length) strings, where there is no prev/next? Too difficult?
(mind you: I don't know exactly what I'm talking about, but in would be
definitely useful).
On Mon, Dec 14, 2009 at 05:10:24PM +0900, Takahiro Itagaki wrote:
Scott Bailey <artacus@comcast.net> wrote:
CREATE TYPE periodtz AS RANGE (timestamptz, '1 microsecond'::interval);
What does the second argument mean? Is it a default interval?
I think it's the granularity. That defines how to find the 'next' point
in time. As I understood it, Scott envisions ranges only for discrete
types, i.e. those advancing in well-defined steps. Note that (a) I might
be wrong and (b) there might be a very good reason for doing it this way.
So basically I have a pg_range table to store the base typeid, a text
field for the granule value and the granule typeid.As another approach, what about storing typeid in typmod?
(Oid can be assumed to be stored in int32.)For example,
CREATE TABLE tbl ( r range(timestamp) );
SELECT '[ 2.0, 3.0 )'::range(float);There might be some overhead to store typeid for each range instance,
but the typmod approach does not require additinal catalogs and syntax
changes. It can be possible even on 8.4.
This looks more natural to me too.
Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFLJgAZBcgs9XrR2kYRAljHAJwJjYV6fHz4qPSY6sXROYZ6pKIlGQCeO4X1
eszUJopVGqcPkXbiHdQOVrs=
=IYQ0
-----END PGP SIGNATURE-----
Import Notes
Reply to msg id not found: 20091214171024.8A94.52131E4D@oss.ntt.co.jp4B25EE21.1000602@comcast.net
On Mon, Dec 14, 2009 at 4:06 AM, <tomas@tuxteam.de> wrote:
As another approach, what about storing typeid in typmod?
(Oid can be assumed to be stored in int32.)For example,
CREATE TABLE tbl ( r range(timestamp) );
SELECT '[ 2.0, 3.0 )'::range(float);There might be some overhead to store typeid for each range instance,
but the typmod approach does not require additinal catalogs and syntax
changes. It can be possible even on 8.4.This looks more natural to me too.
It 's very different than the way we've traditionally used typmod,
though, which Tom described pretty well here:
http://archives.postgresql.org/pgsql-hackers/2009-11/msg01183.php
For example, function signatures ignore typmod, so you'll be able to
write a function that takes a range, but you won't know what kind of
range you're getting. Pavel proposed changing that, but the problem
is that while you might want to discriminate on the basis of what sort
of range you're getting, you probably DON'T want to discriminate on
the length of the character string being passed in with a varchar
argument, or the number of decimal places in a numeric.
So I think this is going to be awkward.
Also, typid is unsigned and typmod is signed. Again, awkward. Maybe
with a big enough crowbar you can make it work, but it seems like it
won't be pretty...
...Robert
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Mon, Dec 14, 2009 at 06:02:04AM -0500, Robert Haas wrote:
On Mon, Dec 14, 2009 at 4:06 AM, <tomas@tuxteam.de> wrote:
[...]
This looks more natural to me too.
It 's very different than the way we've traditionally used typmod,
though, which Tom described pretty well here:http://archives.postgresql.org/pgsql-hackers/2009-11/msg01183.php
For example, function signatures ignore typmod, so you'll be able to
write a function that takes a range, but you won't know what kind of
range you're getting [...]
Also, typid is unsigned and typmod is signed. Again, awkward [...]
Ugh. I see. Thank you for this very insightful comment.
Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFLJiVuBcgs9XrR2kYRAp4ZAJsHjzYuVxwaeAUr1ogqRsZOecxdcwCeLfUv
8lZmeY6lb4r+57c6ZdB0J9M=
=0Ips
-----END PGP SIGNATURE-----
Scott Bailey <artacus@comcast.net> writes:
So basically I have an anyrange pseudo type with the functions prev, next,
last, etc defined. So instead of hard coding range types, we would allow the
user to define their own range types. Basically if we are able to determine
the previous and next values of the base types we'd be able to define a
range type. I'm envisioning in a manner much like defining an enum
type.
It's not clear how to define those functions for the prefix_range
datatype, where '123' represents any text begining with those chars and
'123[4-6]' any text begining with '123' then either 4, 5 or 6.
What's supposed to return SELECT next('123'::prefix_range); ?
Regards,
--
dim
PS: as I'm used to use that in the telephony context, the example
contain figures, but it's a text based type and given questions and
reports in pgsql-general, people do use it with text ranges too.
On Sun, Dec 13, 2009 at 11:49:53PM -0800, Scott Bailey wrote:
So basically I have an anyrange pseudo type with the functions prev,
next, last, etc defined. So instead of hard coding range types, we would
allow the user to define their own range types. Basically if we are able
to determine the previous and next values of the base types we'd be able
to define a range type. I'm envisioning in a manner much like defining
an enum type.
I find it odd that you could define functions next() and prev() since
that assumes some kind of dicretisation which simply does not exist for
most types I can think of.
It would seem to me the real useful uses of ranges would be the
operations overlaps, disjoint, proceeds, follows, etc, which could all
be defined on any well-ordered type (wherever greater-than and
less-than are well defined). No need to discretise anything.
Do you have any actual usecase for a distretized range type for
timestamp?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
Scott Bailey <artacus@comcast.net> writes:
So basically I have an anyrange pseudo type with the functions prev,
next, last, etc defined. So instead of hard coding range types, we would
allow the user to define their own range types. Basically if we are able
to determine the previous and next values of the base types we'd be able
to define a range type. I'm envisioning in a manner much like defining
an enum type.
I think array types, not enums, would be a better model.
The real question is how the heck granularity enters into it. Why
should a range type require that? I think you are mixing up two
concepts that would be better kept separate.
In particular, the granularity examples you give seem to assume that
the underlying datatype is exact not approximate --- which among other
things will mean that it fails to work for float timestamps. Since
timestamps are supposedly the main use-case, that's pretty troubling.
regards, tom lane
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
As another approach, what about storing typeid in typmod?
(Oid can be assumed to be stored in int32.)
No, it cannot --- you'd be one bit short. The general rule for typmod
is that all negative values are treated as "unspecified". Even if you
managed to find and change every place that assumed that, you'd still
have a failure for -1, which is a perfectly legal value of Oid.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Dec 14, 2009 at 4:06 AM, <tomas@tuxteam.de> wrote:
As another approach, what about storing typeid in typmod?
(Oid can be assumed to be stored in int32.)
It 's very different than the way we've traditionally used typmod,
Aside from the problems already pointed out, there's another: this
definition usurps the ability to attach a typmod to the range's
base type. Again, you should be looking at arrays not enums for
a reference example. The typmod of an array feeds through to its
element type.
regards, tom lane
Martijn van Oosterhout wrote:
On Sun, Dec 13, 2009 at 11:49:53PM -0800, Scott Bailey wrote:
So basically I have an anyrange pseudo type with the functions prev,
next, last, etc defined. So instead of hard coding range types, we would
allow the user to define their own range types. Basically if we are able
to determine the previous and next values of the base types we'd be able
to define a range type. I'm envisioning in a manner much like defining
an enum type.I find it odd that you could define functions next() and prev() since
that assumes some kind of dicretisation which simply does not exist for
most types I can think of.
Because intervals (mathematical not SQL) can be open or closed at each
end point we need to know what the next an previous value would be at
the specified granularity. And while you can do some operations without
knowing this, there are many you can't. For instance you could not tell
whether two [] or () ranges were adjacent, or be able to coalesce an
array of ranges.
Show quoted text
It would seem to me the real useful uses of ranges would be the
operations overlaps, disjoint, proceeds, follows, etc, which could all
be defined on any well-ordered type (wherever greater-than and
less-than are well defined). No need to discretise anything.Do you have any actual usecase for a distretized range type for
timestamp?
Scott Bailey <artacus@comcast.net> writes:
Because intervals (mathematical not SQL) can be open or closed at each
end point we need to know what the next an previous value would be at
the specified granularity. And while you can do some operations without
knowing this, there are many you can't. For instance you could not tell
whether two [] or () ranges were adjacent, or be able to coalesce an
array of ranges.
This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges. It has nothing
whatsoever to do with assuming that the data type is discrete;
these concepts are perfectly well defined for the reals, for example.
What it is about is whether the inclusion conditions are "< bound"
or "<= bound".
regards, tom lane
Because intervals (mathematical not SQL) can be open or closed at each
end point we need to know what the next an previous value would be at
the specified granularity. And while you can do some operations without
knowing this, there are many you can't. For instance you could not tell
whether two [] or () ranges were adjacent, or be able to coalesce an
array of ranges.This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges. It has nothing
whatsoever to do with assuming that the data type is discrete;
these concepts are perfectly well defined for the reals, for example.
What it is about is whether the inclusion conditions are "< bound"
or "<= bound".
IMHO the first question is whether, for integers, [1,2] UNION [3,5]
should be equal to [1,5]. In math this is certainly true, and defining
'next' seems like a reasonable way to establish this in postgres.
The next question is whether, for floats, [1,3-FLT_EPSILON] UNION
[3,5] should be [1,5].
And the next question is whether, for numeric(6,2), [1,2.99] UNION
[3,5] should be [1,5].
FWIW, I would answer yes, no, yes to those three questions.
-Nathan
On Mon, 2009-12-14 at 11:25 -0500, Tom Lane wrote:
Scott Bailey <artacus@comcast.net> writes:
Because intervals (mathematical not SQL) can be open or closed at each
end point we need to know what the next an previous value would be at
the specified granularity. And while you can do some operations without
knowing this, there are many you can't. For instance you could not tell
whether two [] or () ranges were adjacent, or be able to coalesce an
array of ranges.This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges. It has nothing
whatsoever to do with assuming that the data type is discrete;
these concepts are perfectly well defined for the reals, for example.
What it is about is whether the inclusion conditions are "< bound"
or "<= bound".
Of course you can still define the obvious "contains" and "overlaps"
operators for a continuous range. But there are specific differences in
the API between a discrete range and a continuous range, which is what
Scott was talking about.
1. With discrete ranges, you can change the displayed
inclusivity/exclusivity without changing the value. For instance in the
integer domain, [ 5, 10 ] is the same value as [ 5, 11 ). This works on
both input and output. It is not possible to change the display for
continuous ranges.
2. With discrete ranges, you can get the last point before the range,
the first point in the range, the last point in the range, and the first
point after the range. These are more than enough to describe the range
completely. For continuous ranges, those functions will fail depending
on the inclusivity/exclusivity of the range. Practically speaking, you
would want to have a different set of accessors: start_inclusive(),
start_point(), end_point(), and end_inclusive(). However, those
functions can't be usefully defined on a discrete range.
We can't choose the API for continuous ranges as the API for discrete
ranges as well. If we did, then we would think that [5, 10] and [5, 11)
were not equal, but they are. Similarly, we would think that [5, 10] and
[11, 12] were not adjacent, but they are.
So there are certainly some user-visible API differences between the
two, and I don't think those differences can be hidden.
Regards,
Jeff Davis
On Mon, 2009-12-14 at 09:58 -0500, Tom Lane wrote:
In particular, the granularity examples you give seem to assume that
the underlying datatype is exact not approximate --- which among other
things will mean that it fails to work for float timestamps. Since
timestamps are supposedly the main use-case, that's pretty troubling.
Additionally, granularity for timestamps is not particularly useful when
you get to things like "days" and "months" which don't have a clean
algebra.
Is the granule there only to try to support continuous ranges? If so, I
don't think it's necessary if we expose the API differences I outlined
in another email in this thread. Also, that would mean that we don't
need a granule for float, because we can already treat it as discrete*.
Regards,
Jeff Davis
*: nextafter() allows you to increment or decrement a double (loosely
speaking), and according to the man page it's part of C99 and
POSIX.1-2001.
Nathan Boley <npboley@gmail.com> writes:
This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges.
IMHO the first question is whether, for integers, [1,2] UNION [3,5]
should be equal to [1,5]. In math this is certainly true, and defining
'next' seems like a reasonable way to establish this in postgres.
Well, that's nice to have (not essential) for data types that actually
are discrete. It's not a sufficient argument for creating a definition
that is flat out broken for non-discrete datatypes.
It might be worth pointing out here that the concept of an open interval
was only invented in the first place for dealing with a continuum.
If you could assume the underlying set is discrete, every open interval
could be replaced with a closed one, using the next or prior value as
the bound instead. There would be no need for two kinds of interval.
If you are intending to support UNION on intervals, you are going to
need some more-general representation anyway. (I trust you don't think
we will accept a datatype for which [1,2] UNION [3,5] works but
[1,2] UNION [4,5] throws an error.) So whether the code can reduce the
result of a union to a single range or has to keep it as two ranges is
only an internal optimization issue anyhow, not something that should
drive an artificial (and infeasible) attempt to force every domain to be
discrete.
regards, tom lane
On Mon, 2009-12-14 at 10:00 -0800, Nathan Boley wrote:
IMHO the first question is whether, for integers, [1,2] UNION [3,5]
should be equal to [1,5]. In math this is certainly true, and defining
'next' seems like a reasonable way to establish this in postgres.
[ you say "yes" ]
Agreed.
The next question is whether, for floats, [1,3-FLT_EPSILON] UNION
[3,5] should be [1,5].
[ you say "no" ]
I think this should be true, because all floats between 1 and 5 are
contained. I don't feel too strongly about this, so I would not complain
if floats were treated as continuous.
And the next question is whether, for numeric(6,2), [1,2.99] UNION
[3,5] should be [1,5].
[ you say "yes" ]
I almost agree. Unfortunately, typmod isn't really a part of the type,
it just affects input/output. So, we can't really use it that way -- as
Tom points out, typmod is not passed along to functions that take the
value.
But if it were a part of the type, then I would certainly agree.
Regards,
Jeff Davis
Jeff Davis wrote:
So there are certainly some user-visible API differences between the
two, and I don't think those differences can be hidden.
ISTM you are saying we should have two types of range types.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Jeff Davis <pgsql@j-davis.com> writes:
On Mon, 2009-12-14 at 11:25 -0500, Tom Lane wrote:
This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges.
Of course you can still define the obvious "contains" and "overlaps"
operators for a continuous range. But there are specific differences in
the API between a discrete range and a continuous range, which is what
Scott was talking about.
Well, if the intention is to invent two different kinds of ranges, with
different operators, for continuous and discrete data types, then fine.
But the original post suggested no such thing, and provided (unworkable)
examples suggesting that the intent was to force every type to be
treated as discrete whether that makes any sense or not.
The main question I would have is how to tell whether the underlying
type is really discrete. If we allow people to define things like
"range over float4 with 0.000001 step", then somebody will try to do it
--- and file bugs when it doesn't work sanely. I don't think there is
anything in our API for user-defined types that really tells you whether
it's an exact or approximate type.
(Also, stuff like strings simply doesn't have any sane concept of a
unique next or previous value. I think the number of types that are
really discrete in this sense is very small, like maybe just ints and
enums.)
regards, tom lane
Tom Lane wrote:
Scott Bailey <artacus@comcast.net> writes:
Because intervals (mathematical not SQL) can be open or closed at each
end point we need to know what the next an previous value would be at
the specified granularity. And while you can do some operations without
knowing this, there are many you can't. For instance you could not tell
whether two [] or () ranges were adjacent, or be able to coalesce an
array of ranges.This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges. It has nothing
whatsoever to do with assuming that the data type is discrete;
these concepts are perfectly well defined for the reals, for example.
What it is about is whether the inclusion conditions are "< bound"
or "<= bound".
I won't address how you draw your conclusions here. But I find it
'interesting' that you assume that I don't know what I'm talking about
rather than assume you don't fully understand what I'm talking about.
Anyhow. For any given range you may be 4 combinations of values. Either
the first value included in the range '[' or the last value preceding
the start of the range '('; and the last value included in the range ']'
or the first value following the end of the range ')'. We aren't going
to store all four data points so we need to normalize into the most
common form, a half-open interval [) and store just those two values.
The first value included in the range and the first value after the end
of our range.
So lets say you are using a numeric range to model the high and low
values of stocks trading on a given day. Now imagine doing this with no
concept of granularity. You will most likely be given a range [low,
high] with inclusive end points. So how do you convert that to a
closed-open interval so you can store it? Is 20.42000000000000000001 the
next value after 20.42? Probably not. You are going to want to define
0.01 as the granularity for this (either type or column) so that 20.43 is.
Or again are the ranges [14.0, 22.0] and [22.1, 29.0] adjacent? Maybe,
maybe not. There is no way to tell w/o knowing the granularity. Perhaps
the granularity is 0.000000001 and there are a billion values that are
not included. Or perhaps the granularity is 0.1 and the are adjacent.
Scott