Range Types and extensions

Started by Jeff Davisalmost 15 years ago59 messageshackers
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

1. ANYRANGE + CREATE TYPE ... AS RANGE
--------------------------------------
This is the most compelling, in my opinion. People can define new range
functions and new range types independently and each one gets the
benefit of the other automatically. Without this, there will be an
explosion of functions and a bunch of inconsistencies like functions
that support most range types but not all (merely because the function
author didn't know that the type existed).

In the several talks that I've given, a common question is related to
"multiranges" (ranges with holes). These get a little complex, and I
don't have a complete answer. However, multiranges can be approximated
with ordered arrays of non-overlapping, non-adjacent ranges. If someone
wants to take it upon themselves to develop a set of operators here,
that would be great -- but without ANYRANGE the operators would be
unmanageable.

2. Documentation and Tests
--------------------------
Let's say we take a minimalist view, and only have ANYRANGE and CREATE
TYPE ... AS RANGE in core; and leave the rest as an extension.

What exactly would the documentation say? I think it would be even more
hypothetical and abstract than the documentation for Exclusion
Constraints. So, there is a certain documentation advantage to having at
least enough functionality to allow someone to try out the feature.

And the tests for such a minimalist feature would be a significant
challenge -- what do we do there? Get pg_regress to load the extension
from PGXN?

3. Quality
----------
PostgreSQL has a great reputation for quality, and for good reason. But
extensions don't follow the same quality-control standards; and even if
some do, there is no visible stamp of approval. So, to ask someone to
use an extension means that they have to evaluate the quality for
themselves, which is a pretty high barrier.

Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
many of the other issues, quality control is one of the biggest ones
remaining. I still get questions about when the temporal type will be
"in core", and I think this is why.

I don't think this is a good excuse to put it in core though. We need to
solve this problem, and the best way to start is by getting
well-reviewed, high-quality extensions out there.

4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
---------------------------------
There are a few aspects of range types that aren't in the first patch,
but are fairly obvious follow-up additions. These will require some
knowledge about ranges in the backend, like finding the "overlaps"
operator for a range. The current patch provides this knowledge by
providing a built-in overlaps operator for ANYRANGE. This would be a
non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in "range" interface,
and the range extension could add "&&" as the range interface's overlaps
operator for the type ANYRANGE.

=================================

So, where on this spectrum should range types fall? I think the most
minimalist would be to only support #1 (and the necessary type IO
functions); and leave all other functions, operators, and opclasses to
an extension. That has a lot of appeal, but I don't think we can ignore
the challenges above.

On the other hand, trying to make it a complete feature in core has
challenges as well. For instance, even with Range Types, Exclusion
Constraints aren't practical out-of-the-box unless we also have
BTree-GiST in core. So there's a snowball effect.

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

Thoughts?

Regards,
Jeff Davis

#2Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#1)
Re: Range Types and extensions

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

I believe that ranges aka intervals are widely useful generic types, next after
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.

In particular, the usefulness of ranges/intervals is often orthogonal to many
other things, and for many types including numbers, strings, temporals.

Now assuming that a range/interval value is generally defined in terms of a pair
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or
{<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is
capable of distinguishing open and closed intervals.

For example, a range value can be represented by a tuple with 4 attributes,
where two of those are the endpoint values, and two of those are booleans saying
whether each of the endpoints is inside or outside the range/interval.

Also, if Postgres has some concept of type-generic special values -Inf and +Inf
(which always sort before or after any other value in the type system), those
can be used as endpoints to indicate that the interval is unbounded.

Unless you have some other syntax in mind, I suggest lifting the range literal
syntax from Perl 6, where ".." is an infix operator building a range between its
arguments, and a "^" on either side means that side is open, I think; so there
are 4 variants: {..,^..,..^,^..^}.

Now as to general usefulness of intervals ...

Any operation that wants to deal with a range somehow, such as the BETWEEN
syntax, could instead use a range/interval; for example, both of:

foo in 1..10

foo between 1 and 10

... would mean the same thing, but the 1..10 can be replaced by an arbitrary
value expression or variable reference.

Likewise with:

date in start ..^ end

date >= start and date < end

... mean the same thing.

The LIMIT clause could take a range to specify take and skip count at once.

Array slicing can be done using foo[first..last] or such.

A random number generator that takes endpoints can take a range argument.

An array or relation of these range can represent ranges with holes, and the
general results of range union operations.

-- Darren Duncan

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Darren Duncan (#2)
Re: Range Types and extensions

2011/6/6 Darren Duncan <darren@darrenduncan.net>:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

it should be supported by FOREACH statement in PL/pgSQL

Pavel

#4Darren Duncan
darren@darrenduncan.net
In reply to: Pavel Stehule (#3)
Re: Range Types and extensions

Pavel Stehule wrote:

2011/6/6 Darren Duncan <darren@darrenduncan.net>:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

it should be supported by FOREACH statement in PL/pgSQL

Yes, absolutely. I know this feature is loved in Perl. But this usage would
only work for a more limited range of data types, namely those over which one
can build a sequence generator, such as integers, because they have a
next-value/prev-value function defined. In other words, while range types in
general would work for any ordered type, FOREACH would only work for the subset
of those that are ordinal types. -- Darren Duncan

#5Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Jeff Davis (#1)
Re: Range Types and extensions

Jeff Davis <pgsql@j-davis.com> writes:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

That's a good question :)

I think the way things are going to be organised now is that we will
have core-blessed extensions: don't mix the mechanism and the policy.

non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in "range" interface,
and the range extension could add "&&" as the range interface's overlaps
operator for the type ANYRANGE.

That means that this is, IMHO, the right approach. Have core support
that enables user defined RANGE types with indexing and planner support,
etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

And the useful stuff you need to have to benefit from that core support
would be an extension. It could be a core maintained extension, and it
could even get installed by default, so that all the users would need to
do is 'CREATE EXTENSION timeranges;', for example.

So, I see us getting those different kinds of extensions in the future:

a. core extensions, shipped by default
b. contribs, not shipped by default, maintained by core hackers
c. examples, included in the source code only, maintained as contribs
d. “trusted network” of extensions (pgdg, pgxn, debian, privates, etc)
e. external independent extensions, just as now

The other main difference between a core extension and a contrib will be
where it's documented. Greg Smith proposed a patch wherein he moved
some contribs to a new extension/ place, and had them auto installed.

I think the consensus is to instead add a new chapter (maybe between
current chapters 9. Functions and Operators and 10. Type Conversion) and
host “core extensions” docs there. The source code organisation is
controversial because technically not necessary. We have to keep the
work Greg did to keep those contribs shipped by default. Oh, and that
is on the 9.1 Open Items, right?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#6Jeff Davis
pgsql@j-davis.com
In reply to: Pavel Stehule (#3)
Re: Range Types and extensions

On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:

2011/6/6 Darren Duncan <darren@darrenduncan.net>:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

it should be supported by FOREACH statement in PL/pgSQL

Oh, good idea. It would only work for discrete ranges though.

However, I would need to somehow reintroduce the concept of "next",
which has some hazards to it (as Tom pointed out, we don't want someone
to define the "next" for a float to be "+1.0"). I'll have to think about
this.

Regards,
Jeff Davis

#7Jeff Davis
pgsql@j-davis.com
In reply to: Darren Duncan (#2)
Re: Range Types and extensions

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

I believe that ranges aka intervals are widely useful generic types, next after
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

Now assuming that a range/interval value is generally defined in terms of a pair
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or
{<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is
capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

Also, if Postgres has some concept of type-generic special values -Inf and +Inf
(which always sort before or after any other value in the type system), those
can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

Unless you have some other syntax in mind, I suggest lifting the range literal
syntax from Perl 6, where ".." is an infix operator building a range between its
arguments, and a "^" on either side means that side is open, I think; so there
are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, "." is not valid in an operator name in PG.
Maybe I can use tilde or dash?

Any operation that wants to deal with a range somehow, such as the BETWEEN
syntax, could instead use a range/interval; for example, both of:

foo in 1..10

I don't know if it's reasonable to introduce syntax like "in" here.
Maybe we could just still use "between" and it would recognize that the
RHS is a range?

The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use ":" as an operator.

A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

An array or relation of these range can represent ranges with holes, and the
general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

Regards,
Jeff Davis

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jeff Davis (#6)
Re: Range Types and extensions

2011/6/6 Jeff Davis <pgsql@j-davis.com>:

On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:

2011/6/6 Darren Duncan <darren@darrenduncan.net>:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

it should be supported by FOREACH statement in PL/pgSQL

Oh, good idea. It would only work for discrete ranges though.

However, I would need to somehow reintroduce the concept of "next",
which has some hazards to it (as Tom pointed out, we don't want someone
to define the "next" for a float to be "+1.0"). I'll have to think about
this.

we can define a step

FOREACH x IN RANGE ..... BY ....
LOOP
END LOOP

Regards

Pavel

Show quoted text

Regards,
       Jeff Davis

#9Jeff Davis
pgsql@j-davis.com
In reply to: Dimitri Fontaine (#5)
Re: Range Types and extensions

On Mon, 2011-06-06 at 14:42 +0200, Dimitri Fontaine wrote:

I think the way things are going to be organised now is that we will
have core-blessed extensions: don't mix the mechanism and the policy.

I like that idea.

non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in "range" interface,
and the range extension could add "&&" as the range interface's overlaps
operator for the type ANYRANGE.

That means that this is, IMHO, the right approach. Have core support
that enables user defined RANGE types with indexing and planner support,
etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

If we take the minimal approach, the index support would be the first to
be moved to an extension. In order to have index support in core, we
need quite a few functions and a significant amount of code.

Minimal would be:
* CREATE TYPE ... AS RANGE
* ANYRANGE
* The IO functions
* Possibly the constructors and accessors ( range(),
range_oc(), lower(), upper(), etc.)

Regarding the type interfaces, the only thing that really worries me
there is that my future work will depend on them existing, and I haven't
really thought through the details. For instance, it just occurred to me
recently that it would need to support polymorphic types, which might be
a little bit more complicated than a simple lookup.

I suppose it's easier to put a few functions in core later if we get
stuck than to rip them out later.

And the useful stuff you need to have to benefit from that core support
would be an extension. It could be a core maintained extension, and it
could even get installed by default, so that all the users would need to
do is 'CREATE EXTENSION timeranges;', for example.

Sounds good to me. However, would the extension be available in
pg_regress? If not, I will need to include those constructors/accessors
to be able to test anything.

I think the consensus is to instead add a new chapter (maybe between
current chapters 9. Functions and Operators and 10. Type Conversion) and
host “core extensions” docs there. The source code organisation is
controversial because technically not necessary. We have to keep the
work Greg did to keep those contribs shipped by default. Oh, and that
is on the 9.1 Open Items, right?

OK, so there are still a few things to be decided around documentation
and tests. Both of those things can take a significant amount of time to
rework, so I think I'll leave it alone until we have more of a
consensus.

We still have time before 9.2 to break some of the code out into an
extension when we do have the doc/test issues resolved.

Regards,
Jeff Davis

#10Chris Browne
cbbrowne@acm.org
In reply to: Jeff Davis (#1)
Re: Range Types and extensions

On Sun, Jun 5, 2011 at 6:59 PM, Jeff Davis <pgsql@j-davis.com> wrote:

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

Thoughts?

I can see merit to having parts of RANGE implemented in core, along
with some of the "usage parts" implemented as extensions, so that if
I'm not actually using (say) INET ranges, then the database isn't
cluttered up with all the functions and operators for INET ranges.

How to slice it apart into an appropriate admixture of core and
extensions is a good question, though it seems pretty likely that
having an extension for each data type that is to be mixed into a
range is a reasonable way to go.

I think this also can make some would-be arguments against RANGE go away...

"I hate that this RANGE extension means we have to draw 5000 lines of
code into every database, and draws in 275 operator functions"
evaporates if the "base" part is entirely smaller, and if you only
draw in all the functions and operators if you request loading of each
of the 17 extensions.

Per-type extensions offers a pretty natural partitioning of the code
for each type, which seems pretty good.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#11Jeff Davis
pgsql@j-davis.com
In reply to: Pavel Stehule (#8)
Re: Range Types and extensions

On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:

we can define a step

FOREACH x IN RANGE ..... BY ....

That wouldn't need any of the range infrastructure at all -- it would be
purely syntactic, right?

Regards,
Jeff Davis

#12Jeff Davis
pgsql@j-davis.com
In reply to: Chris Browne (#10)
Re: Range Types and extensions

On Mon, 2011-06-06 at 16:45 +0000, Christopher Browne wrote:

How to slice it apart into an appropriate admixture of core and
extensions is a good question, though it seems pretty likely that
having an extension for each data type that is to be mixed into a
range is a reasonable way to go.

...

Per-type extensions offers a pretty natural partitioning of the code
for each type, which seems pretty good.

Ideally, most range types can be created with a simple:

CREATE TYPE foorange AS RANGE (subtype=foo);

There might be a few subtype-specific functions, like the canonical
function, but overall it should be a small amount of code per range.
However, I'd say just bundle a bunch of rangetypes together in one
extension. There's not really much cost -- if you are using one range
type, you'll use a few more.

Regards,
Jeff Davis

#13Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#1)
Re: Range Types and extensions

On Sun, Jun 5, 2011 at 2:59 PM, Jeff Davis <pgsql@j-davis.com> wrote:

So, where on this spectrum should range types fall? I think the most
minimalist would be to only support #1 (and the necessary type IO
functions); and leave all other functions, operators, and opclasses to
an extension. That has a lot of appeal, but I don't think we can ignore
the challenges above.

On the other hand, trying to make it a complete feature in core has
challenges as well. For instance, even with Range Types, Exclusion
Constraints aren't practical out-of-the-box unless we also have
BTree-GiST in core. So there's a snowball effect.

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

I don't have clear feeling on this question in general, but if we're
going to break this up into pieces, it's important that they be
logical pieces. Putting half the feature in core and half into an
extension just because we can will simplify complicate code
maintenance to no good end. The snowball effect is something to
avoid, and we need to watch out for that, but if the upshot of putting
part of it in core is that the core code can no longer be understood
or maintained because it depends heavily on a bunch of non-core code,
that's not helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jeff Davis (#11)
Re: Range Types and extensions

2011/6/6 Jeff Davis <pgsql@j-davis.com>:

On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:

we can define a step

FOREACH x IN RANGE ..... BY ....

That wouldn't need any of the range infrastructure at all -- it would be
purely syntactic, right?

I don't think. For lot of types the specification of a step is
necessary - for date, for float.

Pavel

Show quoted text

Regards,
       Jeff Davis

#15Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Robert Haas (#13)
Re: Range Types and extensions

On Mon, Jun 06, 2011 at 12:53:49PM -0400, Robert Haas wrote:

I don't have clear feeling on this question in general, but if we're
going to break this up into pieces, it's important that they be
logical pieces. Putting half the feature in core and half into an
extension just because we can will simplify complicate code
maintenance to no good end. The snowball effect is something to
avoid, and we need to watch out for that, but if the upshot of putting
part of it in core is that the core code can no longer be understood
or maintained because it depends heavily on a bunch of non-core code,
that's not helpful.

And concretely, code paths that cannot be exercised easily from
core-only code will not get regression tested, and will therefore rot.

Ross
--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

#16Stephen Frost
sfrost@snowman.net
In reply to: Dimitri Fontaine (#5)
Re: Range Types and extensions

* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote:

That means that this is, IMHO, the right approach. Have core support
that enables user defined RANGE types with indexing and planner support,
etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

Yes, we do, however..

And the useful stuff you need to have to benefit from that core support
would be an extension. It could be a core maintained extension, and it
could even get installed by default, so that all the users would need to
do is 'CREATE EXTENSION timeranges;', for example.

I don't like the idea of having a capability which is not utilized
in core. We should make it so extensions can *also* have access to
define their own, but we should have the basics covered in core.

a. core extensions, shipped by default

Having it as a core extension might work, but I'm not really 'sold' on
it.

Thanks,

Stephen

#17Merlin Moncure
mmoncure@gmail.com
In reply to: Jeff Davis (#1)
Re: Range Types and extensions

On Sun, Jun 5, 2011 at 1:59 PM, Jeff Davis <pgsql@j-davis.com> wrote:

In the several talks that I've given, a common question is related to
"multiranges" (ranges with holes). These get a little complex, and I
don't have a complete answer. However, multiranges can be approximated
with ordered arrays of non-overlapping, non-adjacent ranges. If someone
wants to take it upon themselves to develop a set of operators here,
that would be great -- but without ANYRANGE the operators would be
unmanageable.

2. Documentation and Tests
--------------------------
Let's say we take a minimalist view, and only have ANYRANGE and CREATE
TYPE ... AS RANGE in core; and leave the rest as an extension.

What exactly would the documentation say? I think it would be even more
hypothetical and abstract than the documentation for Exclusion
Constraints. So, there is a certain documentation advantage to having at
least enough functionality to allow someone to try out the feature.

And the tests for such a minimalist feature would be a significant
challenge -- what do we do there? Get pg_regress to load the extension
from PGXN?

3. Quality
----------
PostgreSQL has a great reputation for quality, and for good reason. But
extensions don't follow the same quality-control standards; and even if
some do, there is no visible stamp of approval. So, to ask someone to
use an extension means that they have to evaluate the quality for
themselves, which is a pretty high barrier.

Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
many of the other issues, quality control is one of the biggest ones
remaining. I still get questions about when the temporal type will be
"in core", and I think this is why.

I don't think this is a good excuse to put it in core though. We need to
solve this problem, and the best way to start is by getting
well-reviewed, high-quality extensions out there.

4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
---------------------------------
There are a few aspects of range types that aren't in the first patch,
but are fairly obvious follow-up additions. These will require some
knowledge about ranges in the backend, like finding the "overlaps"
operator for a range. The current patch provides this knowledge by
providing a built-in overlaps operator for ANYRANGE. This would be a
non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in "range" interface,
and the range extension could add "&&" as the range interface's overlaps
operator for the type ANYRANGE.

=================================

So, where on this spectrum should range types fall? I think the most
minimalist would be to only support #1 (and the necessary type IO
functions); and leave all other functions, operators, and opclasses to
an extension. That has a lot of appeal, but I don't think we can ignore
the challenges above.

On the other hand, trying to make it a complete feature in core has
challenges as well. For instance, even with Range Types, Exclusion
Constraints aren't practical out-of-the-box unless we also have
BTree-GiST in core. So there's a snowball effect.

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

Thoughts?

ISTM (I haven't followed all the lead up so apologies if this is
already covered) a range is a 3rd pseudo 'container' type (the other
two being composites and arrays). Do you see:

*) being able to make arrays of ranges/ranges of arrays?
*) range of composites?

I vote for at minimum the type itself and ANYRANGE to be in core.

From there you could make it like arrays where the range type is

automatically generated for each POD type. I would consider that for
sure on basis of simplicity in user-land unless all the extra types
and operators are a performance hit.

A clean and highly usable implementation in the type system in the
spirit of arrays would be fantastic. I'm particularly interested in
hypothetical constructor/destructor and in/out mechanics...an 'unnest'
like function, a range(a,b,c) that does as row(a,b,c) does, etc,
especially if you can work it out so that everything is not hammered
through textual processing.

merlin

#18Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Stephen Frost (#16)
Re: Range Types and extensions

Stephen Frost <sfrost@snowman.net> writes:

I don't like the idea of having a capability which is not utilized
in core. We should make it so extensions can *also* have access to
define their own, but we should have the basics covered in core.

Well if another part of core depends on the feature set, then of course
you don't have a choice to make it an extension any more. I think
that's where I would draw the line.

Having it as a core extension might work, but I'm not really 'sold' on
it.

Well, core extension means built by default, part of default regression
tests and all. The regression test simply begins with the create
extension stanza, that's about it.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#19Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#7)
Re: Range Types and extensions

Jeff Davis wrote:

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

I believe that ranges aka intervals are widely useful generic types, next after
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

Generally speaking, the best way to go about this is to define the *generic*
data type in the core, and leave most operators to extensions. So, in core, we
need to have the way to select a range value over ANYTYPE either completely as a
value literal or in terms of endpoint values from arbitrary expressions or
variables, store the range value in a database, retrieve it, and access its
component attributes (endpoints, open/closed) in user-defined constraint and
operator definitions.

The fundamental value of ranges is the fact that they're a concise way to store
and express an interval over an ordered type, and to either compare such
intervals or test whether individual values or sets of values are in intervals.
And people do that a *lot* (such as with dates), so I see having this range
type, which is generic and orthogonal to other types in the same way as arrays
or tables are, in core just makes the most sense, and as previously illustrated,
ranges are useful in places one might not always think about.

Ranges are also much more flexible than BETWEEN for what it does, because AFAIK
you can't indicate open or closed with BETWEEN.

You should not need to define separate range types or operators for each ordered
type, same as you should not have to do so for arrays, or where such
functionality is defined should be similar; whatever functionality for arrays
you do or don't define in core, do corresponding things for ranges.

Now assuming that a range/interval value is generally defined in terms of a pair
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or
{<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is
capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about ranges
is actually being overloaded for 2 very distinct concepts, which are probably
best dealt with separately, possibly as distinct types.

This discussion came up in the development of Perl 6 too, and that discussion is
probably worth looking into.

Ranges/intervals in the general sense can *not* be used to enumerate a list of
values in a standard type-sensical manner, such as FOREACH requires.
Ranges/intervals are about *comparison*, meaning combinations of tests of how 2
arbitrary values of an ordered type sort relative to each other, and that's it.
This usage works for integers, other numbers, strings, dates, and so on, all
in a natural manner.

Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and are
used in distinct contexts. Enumeration requires next/prev-value operators,
while ranges/intervals in general do not. Enumeration requires discrete types
(or the faking of such) like integers while ranges work for continuous types.

Moreover, in practice, one probably wants enumerations to be more flexible than
just monotonic increases. With enumerations you'd probably want to start go
top-down or bottom-up, you might want to increase geometrically or by some other
formula rather than incrementally.

I totally agree with sharing syntax and using ranges/intervals to define
sequence generators, but a range value should be considered immutable like a
number or string while a sequence generator may mutate.

For syntax, one could use "x..y" to define an interval while "x...y" for a
sequence generator, or that's what Perl 6 does.

See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that
talks about how Perl 6 does ranges.

Also, if Postgres has some concept of type-generic special values -Inf and +Inf
(which always sort before or after any other value in the type system), those
can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

Well, what you have is the least one would want.

Unless you have some other syntax in mind, I suggest lifting the range literal
syntax from Perl 6, where ".." is an infix operator building a range between its
arguments, and a "^" on either side means that side is open, I think; so there
are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, "." is not valid in an operator name in PG.
Maybe I can use tilde or dash?

Can Pg be changed to support "." in operator names as long as they don't just
appear by themselves? What would this break to do so?

Any operation that wants to deal with a range somehow, such as the BETWEEN
syntax, could instead use a range/interval; for example, both of:

foo in 1..10

I don't know if it's reasonable to introduce syntax like "in" here.
Maybe we could just still use "between" and it would recognize that the
RHS is a range?

I believe it is quite reasonable to treat ranges like sets, in an abstract
sense, and so using set membership syntax like "in" is valid. Same as one
should be able to use "in" to test whether a value is in an array. I would
expect "in" to be a polymorphic infix operator same as "<" or "=" etc are,
aren't they? This shouldn't conflict with testing tuples in relations as they
are different types, same as you can use the same "<" for numbers and strings,
can't you?

We could add parenthesis if that helps:

foo in (1..10)

The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use ":" as an operator.

On that note, some languages use ":" for defining intervals rather than "..".

Some languages also use round parenthesis or curly braces to define intervals,
but I really don't like that and we shouldn't use it.

A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

Exactly.

An array or relation of these range can represent ranges with holes, and the
general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

I also see these as considerably less important and useful in practice than the
continuous intervals. Facilities for discontinuous intervals could more easily
be left to extensions than those for continuous ones. I see the continuous as
more fundamental, at least in the same manner as seeing integers as more
fundamental than rationals (you can define the latter with the former), though
one could define things in the opposite manner too.

Regards,
Jeff Davis

-- Darren Duncan

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#17)
Re: Range Types and extensions

Merlin Moncure <mmoncure@gmail.com> writes:

I vote for at minimum the type itself and ANYRANGE to be in core.
From there you could make it like arrays where the range type is
automatically generated for each POD type. I would consider that for
sure on basis of simplicity in user-land unless all the extra types
and operators are a performance hit.

Auto-generation of range types isn't going to happen, simply because the
range type needs more information than is provided by the base type
declaration. (First, you need a btree opclass, and second, you need a
"next" function if it's a discrete type.)

By my count there are only about 20 datatypes in core for which it looks
sensible to provide a range type (ie, it's a non-deprecated,
non-composite type with a standard default btree opclass). For that
many, we might as well just build 'em in.

regards, tom lane

#21Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#21)
#23Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#22)
#24Jeff Davis
pgsql@j-davis.com
In reply to: Darren Duncan (#19)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#24)
#26Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#23)
#27Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#24)
#28Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#23)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#28)
#30Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#28)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Florian Pflug (#30)
#32Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Florian Pflug (#32)
#34Darren Duncan
darren@darrenduncan.net
In reply to: Robert Haas (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Darren Duncan (#34)
#36Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#30)
#37Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#36)
#38Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#37)
#39Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#38)
#40Martijn van Oosterhout
kleptog@svana.org
In reply to: Florian Pflug (#39)
#41Jeff Davis
pgsql@j-davis.com
In reply to: Martijn van Oosterhout (#40)
#42Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#41)
#43Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#42)
#44Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#43)
#45David Fetter
david@fetter.org
In reply to: Jeff Davis (#43)
#46Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#44)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#44)
#48Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#47)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#48)
#50Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#46)
#51Merlin Moncure
mmoncure@gmail.com
In reply to: Jeff Davis (#50)
#52Florian Pflug
fgp@phlo.org
In reply to: Merlin Moncure (#51)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#52)
#54Darren Duncan
darren@darrenduncan.net
In reply to: Tom Lane (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Darren Duncan (#54)
#56Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#55)
#57Darren Duncan
darren@darrenduncan.net
In reply to: Florian Pflug (#56)
#58Jeff Davis
pgsql@j-davis.com
In reply to: Darren Duncan (#57)
#59Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#53)