Range Types, constructors, and the type system

Started by Jeff Davisalmost 15 years ago59 messageshackers
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
[a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
(anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

by suggesting that functions of the form:
(anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
int4range(int4, int4)
int8range(int8, int8)
...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
Jeff Davis

#2Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#1)
Re: Range Types, constructors, and the type system

To eludicate my earlier comments on this subject ...

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
system-defined collation that can't be changed, and it would be something simple
and generic, such as simply sorting on the codepoint as integers.

When we want to have some other "native" ordering for an existing type, such as
when we want to use a different text collation, we do this by creating a *new
base type*, using CREATE TYPE or some shorthand thereof, and this new type
defines its own ordering, such as that a particular text collation is used.

For example:

CREATE TYPE text__en_US AS (v text);

CREATE TYPE text__C AS (v text);

These will not compare equal to each other or to text, and that is good, because
having a different text collation implies that we consider 'foo'::text__en_US
and 'foo'::text__C to be different values.

I believe that any other approach is worse, and in particular I believe that
creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes
of some other type, whose set of member values is a subset of the other type's
values, and that have the same ordering. Multiple CREATE type over the same
base type don't interfere with each other like multiple DOMAIN could.

Assuming that what CREATE TYPE produces is actually a base type, I believe there
is no better solution using the facilities that SQL provides.

If there is concern about performance related to CREATE TYPE being a composite
type, I'm sure it is possible to engineer an optimization for when the type has
just 1 attribute so that performance isn't an issue. The main point I'm trying
to raise here is about semantics and good type systems.

Likewise, don't let concern about syntax for using values of such composite
types. Once again, there can be shorthands if necessary.

In fact, Postgres could provide a general shorthand for creating a composite
type of 1 attribute whose purpose is to make one type that is like but unequal
to another, and using this shorthand could also cause the composite type to
overload/polymorph all the operators of it's attribute type, so that the syntax
to define one is very short.

For example:

CREATE TYPE text__en_US WRAPS text COLLATE en_US;

... and I assume the name of that attribute would just be system-defined.

Note that the above is specific to wrapping text, and the COLLATE is just
shorthand for defining an ordering function for text__en_US. A more general
form could be:

CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;

And then we can say:

RANGE OF text__en_US

RANGE OF text

... similarly to how we declare array types with ARRAY.

One can also just define range values as they do array values, such as like this:

range('foo','bar') # default collation

range('foo'::text__en_US, 'bar'::text__en_US) # en_us collation

If that seems verbose, I have a few words for you:

1. Users should in practice name their wrapper types over their intended
meaning, not their mechanics, such as like this (not using text for variety),
and that may be more terse:

CREATE TYPE acct_num WRAPS integer; # inherits integer ordering by default

2. If the wrapper types overload the base operators, either automatically or
selectively (does it make sense to multiply an acct_num?), one doesn't have to
keep unpacking and packing them to use them in most cases. For example, I'd
expect many text wrappers to polymorph catenation or substring etc, so no extra
syntax.

3. In practice, most literal values come from applications and are given to SQL
code either as function parameters or bind parameter arguments. While lots of
example code may have literal values in it, I would think that most real-work
code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US
for example. You'd more likely see the less common var::text__en_US or such.

So that's my position, CREATE TYPE on the regular types or the like is the best
solution, and anything else is an inferior solution.

Such a design is also how I do collations and ranges in my Muldis D language.

-- Darren Duncan

Jeff Davis wrote:

Show quoted text

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
[a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
(anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

by suggesting that functions of the form:
(anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
int4range(int4, int4)
int8range(int8, int8)
...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
Jeff Davis

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Darren Duncan (#2)
Re: Range Types, constructors, and the type system

Darren Duncan <darren@darrenduncan.net> writes:

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc.

We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that. So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.

When we want to have some other "native" ordering for an existing type, such as
when we want to use a different text collation, we do this by creating a *new
base type*,

Nope. This has all sorts of problems that you're conveniently ignoring,
beginning with the need to duplicate all of the infrastructure for the
type (such as non-ordering-related operators), and then moving into
difficulties arising from added ambiguity as to which operator is meant.

regards, tom lane

#4Jeff Davis
pgsql@j-davis.com
In reply to: Darren Duncan (#2)
Re: Range Types, constructors, and the type system

On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote:

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
system-defined collation that can't be changed, and it would be something simple
and generic, such as simply sorting on the codepoint as integers.

Well, we're trying to support SQL, and SQL supports collations, so I
don't think we can just ignore that.

I also agree with Tom that it's not a good idea. My reasons are:

* Practical considerations, such as having a bunch of cruft from
duplicated types all over the system. With sufficient changes to the
type system, maybe that could be overcome. Or perhaps domains could be
used to make that work for range types (sort of), but the result would
not be very consistent with the rest of the system.

* It doesn't seem to be based in any mathematical argument. A type is a
set of values, and there's no reason it can't have several total orders;
or no total order at all. So it appears to just be piggybacking on the
type system infrastructure as a place to hold the metadata for a total
order.

* Who's to say that a "compare" function is the only way to specify a
total order? There might be other interfaces that would support
something closer to a lexicographic sort. So, from a theoretical
standpoint, trying to attach a single notion of total order to a type
seems strange, because there might be multiple interfaces for specifying
even one total order.

* It would require extra explicit type annotations. If you have 12 text
types, the only way to practically use any text type is to constantly
specify which more-specific text type it actually is (probably using
the :: operator). That is not necessarily a bad choice if starting a
language from scratch and forming the syntax in a way that it's
reasonable to do. But this is SQL, and lots of type annotations are
un-SQL-like.

Regards,
Jeff Davis

#5Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#1)
Re: Range Types, constructors, and the type system

On Jun26, 2011, at 00:29 , Jeff Davis wrote:

declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Well, there actually *is* some precedence for that kind of top-down
(form a syntactic perspective) type inference. We *enforce* the cast
in
array[]::<arraytype>
and actually for a very similar reason - without the case, there's no
way of knowing which type of empty array was meant. I think we also
special-case
'literal'::<type>
to use the input function of type directly, instead of first creating
a text value and later casting it to <type>.

best regards,
Florian Pflug

#6Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#5)
Re: Range Types, constructors, and the type system

On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote:

Well, there actually *is* some precedence for that kind of top-down
(form a syntactic perspective) type inference. We *enforce* the cast
in
array[]::<arraytype>
and actually for a very similar reason - without the case, there's no
way of knowing which type of empty array was meant. I think we also

That's a good point.

Although, I'm not sure whether that's an argument that we can make the
type system work as-is, or if it means that we should add syntax like
ARRAY[].

special-case
'literal'::<type>
to use the input function of type directly, instead of first creating
a text value and later casting it to <type>.

That is certainly true. Quoted strings never start out as text, they
start out as "unknown" and wait for the type inference to determine the
type. I'm not entirely sure whether a quoted string followed by a cast
is briefly unknown and then cast, or if it's directly interpreted using
the cast's type input function.

I don't know if that's a good example though because it's near the end
of the line and there's no function call in between the arguments and
the cast. It might get more complex with cases like:

range(lower(range(1,2)),upper(range(1,2)))::int8range

but maybe that can be done more easily than I think?

Regards,
Jeff Davis

#7Darren Duncan
darren@darrenduncan.net
In reply to: Tom Lane (#3)
Re: Range Types, constructors, and the type system

Tom Lane wrote:

Darren Duncan <darren@darrenduncan.net> writes:

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc.

We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that. So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.

I'm not saying that you can't use multiple orderings with a data type. I'm just
saying that the type only has *at most* one (possibly none) *native* ordering,
which is what is used when you do something ordered-sensitive with the type,
such as have a range.

To be specific, if the type system supports a concept like Perl 6 roles (or
other languages have similar concepts) where types can declare themselves
members of a union type such as "Ordered", then types of that union would have
the native ordering and other types wouldn't and then generic range operators
could be declared over ANYORDERED or such.

When we want to have some other "native" ordering for an existing type, such as
when we want to use a different text collation, we do this by creating a *new
base type*,

Nope. This has all sorts of problems that you're conveniently ignoring,
beginning with the need to duplicate all of the infrastructure for the
type (such as non-ordering-related operators), and then moving into
difficulties arising from added ambiguity as to which operator is meant.

Well a related solution is to have exactly 1 text wrapper type which has 2
attributes, one being the text value and the other being the collation name.
Then you just have 1 type that does the job instead a separate one per
collation. But to keep the semantics, the collation name is part of the
identity of the type. For example:

CREATE TYPE collated_text AS (t text, c collation);

The key point I'm trying to support is that collation issues are firmly attached
to the text type, not the range type.

Anyway, if a better solution can be arrived at for the problem at hand, then
good for the team; meanwhile, what I've proposed is the best one I can think of.

-- Darren Duncan

#8Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#4)
Re: Range Types, constructors, and the type system

Jeff Davis wrote:

On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote:

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
system-defined collation that can't be changed, and it would be something simple
and generic, such as simply sorting on the codepoint as integers.

Well, we're trying to support SQL, and SQL supports collations, so I
don't think we can just ignore that.

I'm not saying you can't support collations. See also my reply to Tom.

I also agree with Tom that it's not a good idea. My reasons are:

* Practical considerations, such as having a bunch of cruft from
duplicated types all over the system. With sufficient changes to the
type system, maybe that could be overcome. Or perhaps domains could be
used to make that work for range types (sort of), but the result would
not be very consistent with the rest of the system.

Yes, duplication can be avoided.

* It doesn't seem to be based in any mathematical argument. A type is a
set of values, and there's no reason it can't have several total orders;
or no total order at all. So it appears to just be piggybacking on the
type system infrastructure as a place to hold the metadata for a total
order.

Yes, I agree that a type is a set of values, and a type can have 0..N total
orders. My proposal is just that, for those types that have at least 1 total
order, exactly 1 of those is defined to be used implicitly in contexts where a
total order is desired and no explicit collation is given, such as in ranges.

* Who's to say that a "compare" function is the only way to specify a
total order? There might be other interfaces that would support
something closer to a lexicographic sort. So, from a theoretical
standpoint, trying to attach a single notion of total order to a type
seems strange, because there might be multiple interfaces for specifying
even one total order.

Thank you for bringing this up, the notion of multiple interfaces for specifying
even one total order. My example of a compare function was just an example, and
it is valuable to consider that this may not be the only way to do it.

* It would require extra explicit type annotations. If you have 12 text
types, the only way to practically use any text type is to constantly
specify which more-specific text type it actually is (probably using
the :: operator). That is not necessarily a bad choice if starting a
language from scratch and forming the syntax in a way that it's
reasonable to do. But this is SQL, and lots of type annotations are
un-SQL-like.

Well sometimes it doesn't hurt to suggest solutions from the point of view that
one can start the language from scratch, because that provides a clean way to
conceptualize and explain a feature. And then people can find some middle
ground that adapts benefits from that idea for the feature without changing SQL
more than needed. Witness the various improvements to Perl 5 that were first
expressed in terms of Perl 6.

-- Darren Duncan

#9Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#6)
Re: Range Types, constructors, and the type system

On Jun27, 2011, at 02:48 , Jeff Davis wrote:

On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote:

Well, there actually *is* some precedence for that kind of top-down
(form a syntactic perspective) type inference. We *enforce* the cast
in
array[]::<arraytype>
and actually for a very similar reason - without the case, there's no
way of knowing which type of empty array was meant. I think we also

That's a good point.

Although, I'm not sure whether that's an argument that we can make the
type system work as-is, or if it means that we should add syntax like
ARRAY[].

It was meant as an argument for the former, i.e. for extending the type
system (or rather the function call syntax, as I argue below).

special-case
'literal'::<type>
to use the input function of type directly, instead of first creating
a text value and later casting it to <type>.

That is certainly true. Quoted strings never start out as text, they
start out as "unknown" and wait for the type inference to determine the
type. I'm not entirely sure whether a quoted string followed by a cast
is briefly unknown and then cast, or if it's directly interpreted using
the cast's type input function.

It's at least labelled with type "unknown" for a while AFAIK.

I don't know if that's a good example though because it's near the end
of the line and there's no function call in between the arguments and
the cast. It might get more complex with cases like:

range(lower(range(1,2)),upper(range(1,2)))::int8range

but maybe that can be done more easily than I think?

I wouldn't take it that far. What I had in mind was to *only* support
the case where the cast directly follows the function call, i.e. the case
f(...)::type

I view this more as an extension of the function call syntax than of
type inference. In other languages with polymorphism, there usually is
an explicit syntactic construct for specifying the type arguments to
a polymorphic function. For example, C++ you'd write
make_range<int>(3,4)
to call the polymorphic function make_range() with it's (first)
type argument set to "int". I think of
f(...)::type
as essentially the same thing, but re-using already existing syntax
instead of inventing new one.

I just checked - we currently special case "array[]::type" in transformExpr()
by detecting the case of an array expression being the immediate child
of a cast expression. I suggest we do the same for "f(...)::type", i.e.
also special case a function call being the immediate child of a cast
expression and pass down the forced result type to the function call node.

Function call nodes would then usually ignore that passed-down result type,
except in the case of a polymorphic functions whose argument types don't
uniquely define its result type.

But I haven't tried doing that, so there might be stumbling block down
that road that I missed...

best regards,
Florian Pflug

#10Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#9)
Re: Range Types, constructors, and the type system

On Mon, 2011-06-27 at 12:16 +0200, Florian Pflug wrote:

I wouldn't take it that far. What I had in mind was to *only* support
the case where the cast directly follows the function call, i.e. the case
f(...)::type

OK, so instead of writing:
range(lower(range(1,2)),upper(range(1,2)))::int8range

users would write:
range(lower(range(1,2)::int8range),upper(range(1,2)::int8range))::int8range

A little more verbose, but it seems like it wouldn't be a practical
problem in very many cases. Multiple levels of constructors seem like
they'd be fairly uncommon, and probably a case where a function should
be written anyway.

OK, I'll have to think about this a little more, but it seems like a
reasonable approach.

Regards,
Jeff Davis

#11Jeff Davis
pgsql@j-davis.com
In reply to: Darren Duncan (#7)
Re: Range Types, constructors, and the type system

On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote:

Tom Lane wrote:

Darren Duncan <darren@darrenduncan.net> writes:

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc.

We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that. So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.

I'm not saying that you can't use multiple orderings with a data type. I'm just
saying that the type only has *at most* one (possibly none) *native* ordering,
which is what is used when you do something ordered-sensitive with the type,
such as have a range.

So, are you saying that it would be impossible to have a range that uses
a different ordering? What about ORDER BY? What about BTrees?

And if those things can use different orders for the same type, then
what is the difference between what you are suggesting and a default
ordering for the type (which we already support)?

I suppose it's hard to tell what you mean by "native".

Regards,
Jeff Davis

#12Darren Duncan
darren@darrenduncan.net
In reply to: Jeff Davis (#11)
Re: Range Types, constructors, and the type system

Jeff Davis wrote:

On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote:

Tom Lane wrote:

Darren Duncan <darren@darrenduncan.net> writes:

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc.

We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that. So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.

I'm not saying that you can't use multiple orderings with a data type. I'm just
saying that the type only has *at most* one (possibly none) *native* ordering,
which is what is used when you do something ordered-sensitive with the type,
such as have a range.

So, are you saying that it would be impossible to have a range that uses
a different ordering? What about ORDER BY? What about BTrees?

And if those things can use different orders for the same type, then
what is the difference between what you are suggesting and a default
ordering for the type (which we already support)?

I suppose it's hard to tell what you mean by "native".

Regards,
Jeff Davis

Maybe I'm just talking about "default ordering" then. -- Darren Duncan

#13Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#1)
Re: Range Types, constructors, and the type system

On Sat, Jun 25, 2011 at 6:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
 [a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
 (anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

by suggesting that functions of the form:
 (anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
 range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
 range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
 int4range(int4, int4)
 int8range(int8, int8)
 ...
when creating the range type, and it would actually be a usability
improvement.

Couldn't we also do neither of these things? I mean, presumably
'[1,10]'::int8range had better work.

I'm not saying that's ideal from a usability perspective but I fear
this patch is going to be unmanageably large, and separating out the
things that you need for it to work at all from the things that you
need in order for it to be convenient might have some merit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#13)
Re: Range Types, constructors, and the type system

On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote:

Couldn't we also do neither of these things? I mean, presumably
'[1,10]'::int8range had better work.

I think that if we combine this idea with Florian's "PAIR" suggestion
here:
http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

then I think we have a solution.

If we add a type RANGEINPUT that is not a pseudotype, we can use that as
an intermediate type that is returned by range constructors. Then, we
add casts from RANGEINPUT to each range type. That would allow
range(1,2)::int8range
to work without changing the type system around, because range() would
have the signature:
range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
and then the cast would change it into an int8range. But we only need
the one cast per range type, and we can also support all of the other
kinds of constructors like:
range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
range_linf_c(ANYELEMENT) -> RANGEINPUT
without additional hassle.

The RANGEINPUT type itself would hold similar information to actual
range types: the subtype OID (instead of the range type, because it's
not a range yet), optionally the two bounds (depending on the flags),
and the flags byte. The cast to a real range type would read the
subtype, and try to coerce the bounds to the subtype of the range you're
casting to, set the range type oid, leave the flags byte the same, and
it's done.

So, in effect, RANGEINPUT is a special type used only for range
constructors. If someone tried to output it, it would throw an
exception, and we'd even have enough information at that point to print
a nice error message with a hint.

Actually, this is pretty much exactly Florian's idea (thanks again,
Florian), but at the time I didn't like it because "pair" didn't capture
everything that I wanted to capture, like infinite bounds, etc. But
there's no reason that it can't, and your point made me realize that --
you are effectively just using TEXT as the intermediate type (which
works, but has some undesirable characteristics).

Do we think that this is a good way forward? The only thing I can think
of that's undesirable is that it's not normal to be required to cast the
result of a function, and might be slightly difficult to explain in the
documentation in a straightforward way.

Regards,
Jeff Davis

#15Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#14)
Re: Range Types, constructors, and the type system

On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:

So, in effect, RANGEINPUT is a special type used only for range
constructors. If someone tried to output it, it would throw an
exception, and we'd even have enough information at that point to print
a nice error message with a hint.

I don't think I like the idea of throwing an error when you try to
output it, but the rest seems reasonably sensible.

Actually, this is pretty much exactly Florian's idea (thanks again,
Florian), but at the time I didn't like it because "pair" didn't capture
everything that I wanted to capture, like infinite bounds, etc. But
there's no reason that it can't, and your point made me realize that --
you are effectively just using TEXT as the intermediate type (which
works, but has some undesirable characteristics).

What undesirable characteristics?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16David E. Wheeler
david@kineticode.com
In reply to: Jeff Davis (#14)
Re: Range Types, constructors, and the type system

On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote:

Do we think that this is a good way forward? The only thing I can think
of that's undesirable is that it's not normal to be required to cast the
result of a function, and might be slightly difficult to explain in the
documentation in a straightforward way

That's the part that bothers me. I think that if it's not cast it should somehow be useful. Maybe default to a text range or something?

Best,

David

#17Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#15)
Re: Range Types, constructors, and the type system

On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote:

On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:

So, in effect, RANGEINPUT is a special type used only for range
constructors. If someone tried to output it, it would throw an
exception, and we'd even have enough information at that point to print
a nice error message with a hint.

I don't think I like the idea of throwing an error when you try to
output it, but the rest seems reasonably sensible.

I thought it might add a little confusion if people thought they had a
range type but really had RANGEINPUT. For instance, if you do a "create
table as select range(1,2)" then the result might be slightly
unexpected.

But it's probably no more unexpected than "create table as select
'foo'". So, I suppose there's not much reason to throw an error. We can
just output it in the same format as a range type.

It's also much easier to explain something in the documentation that has
an output format, because at least it's tangible.

Actually, this is pretty much exactly Florian's idea (thanks again,
Florian), but at the time I didn't like it because "pair" didn't capture
everything that I wanted to capture, like infinite bounds, etc. But
there's no reason that it can't, and your point made me realize that --
you are effectively just using TEXT as the intermediate type (which
works, but has some undesirable characteristics).

What undesirable characteristics?

Well, for one, outputting something as text and then reading it back in
does not always produce the same value. For instance, for float, it only
does that if you have extra_float_digits set to some high-enough value.
I suppose I could save the GUC, set it, and set it back; but that seems
like unnecessary ugliness.

There's also the deparsing/reparsing cycle. That might not really matter
for performance, but it seems unnecessary.

And there's always the fallback that "we have types for a reason".
Wouldn't it be odd if you wrote a query like:
select range(1,2) || 'foo'
and it succeeded? I'm sure that kind of thing can lead to some dangerous
situations.

Regards,
Jeff Davis

#18Jeff Davis
pgsql@j-davis.com
In reply to: David E. Wheeler (#16)
Re: Range Types, constructors, and the type system

On Tue, 2011-06-28 at 09:30 -0700, David E. Wheeler wrote:

On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote:

Do we think that this is a good way forward? The only thing I can think
of that's undesirable is that it's not normal to be required to cast the
result of a function, and might be slightly difficult to explain in the
documentation in a straightforward way

That's the part that bothers me.

Yeah, that bothered me, too.

I think that if it's not cast it should somehow be useful.

Let's see, what can one do with a range that has no ordering yet? ;)

Robert suggested that we don't need to throw an error, and I think I
agree. Just having a working output function solves most of the
documentation problem, because it makes it less abstract.

The only operators that we could really support are accessors, which
seems somewhat reasonable. However, I'd have some concerns even about
that, because if you do range(10,1), then what's the upper bound?

Maybe default to a text range or something?

That sounds a little dangerous:
select range('1','09')
would fail before it could be cast to int4range.

We could invent an UNKNOWNRANGE type or something. But I don't
particularly like that; it would start out working nicely when people
only had one textrange type, and then their old queries would start
failing when they added another range type based on text.

I think it's fine if the RANGEINPUT type isn't too useful by itself.
It's already a common requirement to cast unknown literals, and this
isn't too much different. It's only for constructors, so it still fits
pretty closely with that idea.

Regards,
Jeff Davis

#19Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#17)
Re: Range Types, constructors, and the type system

On Tue, Jun 28, 2011 at 12:58 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote:

On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:

So, in effect, RANGEINPUT is a special type used only for range
constructors. If someone tried to output it, it would throw an
exception, and we'd even have enough information at that point to print
a nice error message with a hint.

I don't think I like the idea of throwing an error when you try to
output it, but the rest seems reasonably sensible.

I thought it might add a little confusion if people thought they had a
range type but really had RANGEINPUT. For instance, if you do a "create
table as select range(1,2)" then the result might be slightly
unexpected.

True...

But it's probably no more unexpected than "create table as select
'foo'". So, I suppose there's not much reason to throw an error. We can
just output it in the same format as a range type.

+1.

It's also much easier to explain something in the documentation that has
an output format, because at least it's tangible.

+1.

Actually, this is pretty much exactly Florian's idea (thanks again,
Florian), but at the time I didn't like it because "pair" didn't capture
everything that I wanted to capture, like infinite bounds, etc. But
there's no reason that it can't, and your point made me realize that --
you are effectively just using TEXT as the intermediate type (which
works, but has some undesirable characteristics).

What undesirable characteristics?

Well, for one, outputting something as text and then reading it back in
does not always produce the same value. For instance, for float, it only
does that if you have extra_float_digits set to some high-enough value.
I suppose I could save the GUC, set it, and set it back; but that seems
like unnecessary ugliness.

Yeah, I don't think we want to go there.

There's also the deparsing/reparsing cycle. That might not really matter
for performance, but it seems unnecessary.

And there's always the fallback that "we have types for a reason".
Wouldn't it be odd if you wrote a query like:
 select range(1,2) || 'foo'
and it succeeded? I'm sure that kind of thing can lead to some dangerous
situations.

That's pretty much what we tried to get rid of with the 8.3 casting
changes, so agreed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#14)
Re: Range Types, constructors, and the type system

On Jun28, 2011, at 05:42 , Jeff Davis wrote:

On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote:

Couldn't we also do neither of these things? I mean, presumably
'[1,10]'::int8range had better work.

I think that if we combine this idea with Florian's "PAIR" suggestion
here:
http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

then I think we have a solution.

If we add a type RANGEINPUT that is not a pseudotype, we can use that as
an intermediate type that is returned by range constructors. Then, we
add casts from RANGEINPUT to each range type. That would allow
range(1,2)::int8range
to work without changing the type system around, because range() would
have the signature:
range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
and then the cast would change it into an int8range. But we only need
the one cast per range type, and we can also support all of the other
kinds of constructors like:
range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
range_linf_c(ANYELEMENT) -> RANGEINPUT
without additional hassle.

Hm, so RANGEINPUT would actually be what was previously discussed as
the "range as a pair of bounds" definition, as opposed to the
"range as a set of values" definition. So essentially we'd add a
second concept of what a "range" is to work around the range input
troubles.

I believe if we go that route we should make RANGEINPUT a full-blown
type, having "pair of bound" semantics. Adding a lobotomized version
just for the sake of range input feels a bit like a kludge to me.

Alternatively, we could replace RANGEINPUT simply with ANYARRAY,
and add functions ANYRANGE->ANYRANGE which allow specifying the
bound operator (<, <= respectively >,>=) after construction.

So you'd write (using the functions-as-fields syntax I believe
we support)
(ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]'
and
ARRAY[NULL,2]::int8range for '[-inf,2]'

All assuming that modifying the type system to support polymorphic
type resolution based on the return type is out of the question... ;-)

best regards,
Florian Pflug

#21Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#20)
#22Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#21)
#24Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#22)
#25Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#23)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#24)
#27David E. Wheeler
david@kineticode.com
In reply to: Jeff Davis (#21)
#28David E. Wheeler
david@kineticode.com
In reply to: Jeff Davis (#24)
#29Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#26)
#30Florian Pflug
fgp@phlo.org
In reply to: David E. Wheeler (#28)
#31David E. Wheeler
david@kineticode.com
In reply to: Florian Pflug (#30)
#32Peter Eisentraut
peter_e@gmx.net
In reply to: David E. Wheeler (#31)
#33Florian Pflug
fgp@phlo.org
In reply to: Peter Eisentraut (#32)
#34Peter Eisentraut
peter_e@gmx.net
In reply to: Florian Pflug (#33)
#35Florian Pflug
fgp@phlo.org
In reply to: Peter Eisentraut (#34)
#36Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#24)
#37Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#35)
#38Jeff Davis
pgsql@j-davis.com
In reply to: David E. Wheeler (#31)
#39Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#26)
#40David E. Wheeler
david@kineticode.com
In reply to: Jeff Davis (#37)
#41David E. Wheeler
david@kineticode.com
In reply to: Jeff Davis (#38)
#42Jeff Davis
pgsql@j-davis.com
In reply to: David E. Wheeler (#40)
#43Jeff Davis
pgsql@j-davis.com
In reply to: David E. Wheeler (#41)
#44Jeff Davis
pgsql@j-davis.com
In reply to: Florian Pflug (#36)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#44)
#46Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#46)
#48Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#46)
#49Merlin Moncure
mmoncure@gmail.com
In reply to: Robert Haas (#47)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Merlin Moncure (#49)
#51Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#47)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#51)
#53Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#52)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#53)
#55Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#54)
#56Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#55)
#57Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#57)
#59Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#58)