MULTISET and additional functions for ARRAY
Postgres supports ARRAY data types well, but there are some
more array functions in the SQL standard. Also, the standard
has MULTISET data type, that is an unordered array.
It looks easy to support additional array functions. There
might be some confusion to treat multi-dimensional arrays
with them, but we could treat all arrays as one-dimensional
like as unnest().
MULTISET supports are more difficult. We have corresponding
type IDs for each array, but we might not want to add additional
IDs for multiset for each type. Any ideas for the issue?
If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.
If we have troublesome issues to support multiset data types,
I'm thinking to add multiset functions that receives ARRAY
types instead at time first time, because an ARRAY is a
MULTISET by definition. Some of functions for multisets
seems to be useful for arrays, too.
Comments and suggestions welcome.
=== Array functions ===
- [FUNCTION] cardinality(anyarray) => integer
- [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
=== Multiset functions ===
- [FUNCTION] cardinality(anymultiset) => integer
- [FUNCTION] element(anymultiset) => anyelement
- [FUNCTION] multiset_member_of(anymultiset, anyelement) => boolean
[SYNTAX] $2 MEMBER OF $1
- [FUNCTION] multiset_is_a_set(anymultiset) => boolean
[SYNTAX] $1 IS A SET
- [FUNCTION] multiset_sub_multiset_of(anymultiset, anymultiset) => boolean
[SYNTAX] $2 SUB MULTISET OF $1
- [FUNCTION] multiset_union[_all](anymultiset, anymultiset) => anymultiset
[SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2
- [FUNCTION] multiset_intersect[_all](anymultiset, anymultiset) => anymultiset
[SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2
- [FUNCTION] multiset_except[_all](anymultiset, anymultiset) => anymultiset
[SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2
- [AGGREGATE] collect(anyelement) => anymultiset
- [AGGREGATE] fusion(anymultiset) => anymultiset
- [AGGREGATE] intersection(anymultiset) => anymultiset
See also secondary sources.
http://waelchatila.com/2005/05/18/1116485743467.html
http://farrago.sourceforge.net/design/CollectionTypes.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053486.html
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm
--
Itagaki Takahiro
On Nov 11, 2010, at 7:02 AM, Itagaki Takahiro wrote:
MULTISET supports are more difficult. We have corresponding
type IDs for each array, but we might not want to add additional
IDs for multiset for each type. Any ideas for the issue?
Why not?
If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.If we have troublesome issues to support multiset data types,
I'm thinking to add multiset functions that receives ARRAY
types instead at time first time, because an ARRAY is a
MULTISET by definition.
An array is a superset of MULTISET, I guess?
Some of functions for multisets seems to be useful for arrays, too.
Comments and suggestions welcome.
So are you planning to implement multisets? It's a feature I'd love to see…
Best,
David
"David E. Wheeler" <david@kineticode.com> writes:
So are you planning to implement multisets? It's a feature I'd love to see
What actual functionality does it buy? AFAICT from Itagaki-san's
description, it's an array only you ignore the specific element order.
So what? You can write functions that work that way now.
regards, tom lane
On Nov 11, 2010, at 10:05 AM, Tom Lane wrote:
So are you planning to implement multisets? It's a feature I'd love to see
What actual functionality does it buy? AFAICT from Itagaki-san's
description, it's an array only you ignore the specific element order.
So what? You can write functions that work that way now.
Also, no dupes.
David
I think that it would be best to implement MULTISET in the same way that a TABLE
is implemented. Logically and structurally they are the same thing, but that a
MULTISET typically is used as a field value of a table row. Aka, a table and a
multiset are just different names for a relation, loosely speaking.
The association of a multiset-typed attribute of a table with said table is like
the association of a child and parent table in a many-to-one.
So reuse your structure for tables to hold multisets.
-- Darren Duncan
Itagaki Takahiro wrote:
Show quoted text
Postgres supports ARRAY data types well, but there are some
more array functions in the SQL standard. Also, the standard
has MULTISET data type, that is an unordered array.It looks easy to support additional array functions. There
might be some confusion to treat multi-dimensional arrays
with them, but we could treat all arrays as one-dimensional
like as unnest().MULTISET supports are more difficult. We have corresponding
type IDs for each array, but we might not want to add additional
IDs for multiset for each type. Any ideas for the issue?If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.If we have troublesome issues to support multiset data types,
I'm thinking to add multiset functions that receives ARRAY
types instead at time first time, because an ARRAY is a
MULTISET by definition. Some of functions for multisets
seems to be useful for arrays, too.Comments and suggestions welcome.
2010/11/11 David E. Wheeler <david@kineticode.com>:
On Nov 11, 2010, at 10:05 AM, Tom Lane wrote:
So are you planning to implement multisets? It's a feature I'd love to see
What actual functionality does it buy? AFAICT from Itagaki-san's
description, it's an array only you ignore the specific element order.
So what? You can write functions that work that way now.Also, no dupes.
The "multi" in multiset indicates that duplicate elements are
explicitly allowed and tracked.
Nicolas
On Nov 11, 2010, at 10:19 AM, Darren Duncan wrote:
I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and structurally they are the same thing, but that a MULTISET typically is used as a field value of a table row. Aka, a table and a multiset are just different names for a relation, loosely speaking.
That sounds like a composite type to me.
Best,
David
On Nov 11, 2010, at 10:24 AM, Nicolas Barbier wrote:
Also, no dupes.
The "multi" in multiset indicates that duplicate elements are
explicitly allowed and tracked.
D'oh! Right.
D
Excerpts from David E. Wheeler's message of jue nov 11 15:45:55 -0300 2010:
On Nov 11, 2010, at 10:19 AM, Darren Duncan wrote:
I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and structurally they are the same thing, but that a MULTISET typically is used as a field value of a table row. Aka, a table and a multiset are just different names for a relation, loosely speaking.
That sounds like a composite type to me.
No, it's "perpendicular" in the sense that while a composite type allows
you to have different columns, this multiset thing lets you have "rows"
(I initially thought about them as sets of scalars, but AFAIU they could
in turn be rows)
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:
That sounds like a composite type to me.
No, it's "perpendicular" in the sense that while a composite type allows
you to have different columns, this multiset thing lets you have "rows"
(I initially thought about them as sets of scalars, but AFAIU they could
in turn be rows)
How is that different from an array of RECORDs?
Best,
David
On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler <david@kineticode.com> wrote:
On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:
That sounds like a composite type to me.
No, it's "perpendicular" in the sense that while a composite type allows
you to have different columns, this multiset thing lets you have "rows"
(I initially thought about them as sets of scalars, but AFAIU they could
in turn be rows)How is that different from an array of RECORDs?
not a whole lot outside of syntax details...there is a decent summary
here: http://waelchatila.com/2005/05/18/1116485743467.html
I like this part: "Alternatively the SQL standard also permits the
same construct with the bracket trigraphs ??( and ??)" :-D
merlin
Merlin Moncure wrote:
On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler <david@kineticode.com> wrote:
On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:
That sounds like a composite type to me.
No, it's "perpendicular" in the sense that while a composite type allows
you to have different columns, this multiset thing lets you have "rows"
(I initially thought about them as sets of scalars, but AFAIU they could
in turn be rows)How is that different from an array of RECORDs?
I could ask the same question about a TABLE, the ordering issue aside.
This is one place that SQL made things more complicated than they needed to be.
Multisets have generally the same structure *and* operators (union, etc) as
tables, but they use different syntax for each. A better design would be to
make tables and multisets interchangeable. Its an unnecessary distinction.
not a whole lot outside of syntax details...there is a decent summary
here: http://waelchatila.com/2005/05/18/1116485743467.htmlI like this part: "Alternatively the SQL standard also permits the
same construct with the bracket trigraphs ??( and ??)" :-D
As I recall, the concept of using stuff like ?( or ?) etc was so that SQL could
be written in EBCDIC which natively lacks some of the bracketing characters that
ASCII has. Hence, such is an alternative way to spell either { } or [ ] (I
forget which).
-- Darren Duncan
On Fri, Nov 12, 2010 at 03:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David E. Wheeler" <david@kineticode.com> writes:
So are you planning to implement multisets? It's a feature I'd love to see
What actual functionality does it buy? AFAICT from Itagaki-san's
description, it's an array only you ignore the specific element order.
So what? You can write functions that work that way now.
I think there are almost no difference between a multiset and an array
in terms of functions I described in the first mail.
However, if we have separated multiset data type, we could have special
comparison operators for them; "array = array" returns true only if they
have the same elements in the same order, but "multiset = multiset" only
checks elements in them. Also, we could optimize on-disk structure of
multiset for fast UNION operations or for dataset that has many duplicates.
For example, we could use a sorted array of {value, count} pairs.
If we decide to have data type IDs for multiset, I'll go for it (ex. int4,
_int4, and an additional $int4), but it consumes +50% of typoids. If it
is not preferable, only function support might be better at the first try.
--
Itagaki Takahiro
On Fri, Nov 12, 2010 at 06:06, Darren Duncan <darren@darrenduncan.net> wrote:
This is one place that SQL made things more complicated than they needed to
be. Multisets have generally the same structure *and* operators (union,
etc) as tables, but they use different syntax for each. A better design
would be to make tables and multisets interchangeable. Its an unnecessary
distinction.
We can use unnest() to convert MULTISET into TABLE, and collect() agg
function from TABLE to MULTISET. I don't think they need to have the
same on-disk structure; they can share operators and constructor syntax
even now.
--
Itagaki Takahiro
On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.
I would really like to see us fix our type system so that it doesn't
require this type of awful hack. But maybe that's asking too much of
a patch to implement this feature.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.
I would really like to see us fix our type system so that it doesn't
require this type of awful hack. But maybe that's asking too much of
a patch to implement this feature.
The problem is not with the type system: as long as you give multisets
different type OIDs from arrays, everything will work fine. It will
absolutely not work to try to use typmod to make the behavior vary
like that ... but Itagaki-san knew that already.
regards, tom lane
On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.I would really like to see us fix our type system so that it doesn't
require this type of awful hack. But maybe that's asking too much of
a patch to implement this feature.The problem is not with the type system: as long as you give multisets
different type OIDs from arrays, everything will work fine. It will
absolutely not work to try to use typmod to make the behavior vary
like that ... but Itagaki-san knew that already.
And thus you must create a THIRD copy of every entry in pg_type. That
doesn't qualify as a problem?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The problem is not with the type system: as long as you give multisets
different type OIDs from arrays, everything will work fine.
And thus you must create a THIRD copy of every entry in pg_type. That
doesn't qualify as a problem?
[ shrug... ] It's less of a problem than the possible alternatives.
IMO anyway. OIDs are cheap ... replacing OIDs with some sort of
ill-specified composite key throughout the system is not.
But I'm still not convinced that this feature is useful enough to
justify the implementation effort. AFAICS there's nothing here that
you couldn't get with some non-default operators on regular arrays,
with orders of magnitude less work and less impact on the rest of the
system. The only reason to consider implementing it as a separate type
category is the SQL committee decided to invent some syntax --- and
given their lousy taste in syntax, I get less enthused every year about
duplicating it.
regards, tom lane
On Fri, Nov 12, 2010 at 12:53:09AM -0500, Robert Haas wrote:
On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.I would really like to see us fix our type system so that it doesn't
require this type of awful hack. �But maybe that's asking too much of
a patch to implement this feature.The problem is not with the type system: as long as you give multisets
different type OIDs from arrays, everything will work fine. �It will
absolutely not work to try to use typmod to make the behavior vary
like that ... but Itagaki-san knew that already.And thus you must create a THIRD copy of every entry in pg_type. That
doesn't qualify as a problem?
Yes, and I've started a separate thread on this along with a page on
the wiki.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
Postgres supports ARRAY data types well, but there are some
more array functions in the SQL standard. Also, the standard
has MULTISET data type, that is an unordered array.
Here is a WIP patch for multiset function supports. Note that multiset
types are not added in the patch; it just adds functions and syntax.
Arguments or result types of those functions are anyarray rather than
anymultiset. The result type is always flatten into on-dimensional
array because some functions requires per-element operations; I'm not
sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So,
it is treated as trim_array( 9 elements array, 2 elements ) in the patch.
The SQL standard defines special syntax for multiset. I added four
unreserved keywords for them; A, MEMBER, MULTISET, and SUB.
(I don't like such ad-hoc syntax, but it is the standard...)
Some of the new expressions are just syntactic sugar for existing
other expressions or new functions. For example, "$1 MEMBER OF $2" is
expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)".
I have not researched the spec yet enough, especially NULLs in collections.
I'll continue to check the details.
BTW, some of the intermediate products to implement those features might
be useful if exported. like array_sort() and array_unique(). If there is
demand, I'll add those functions, too.
Any comments for the approach or detailed features?
=== New features ===
- [FUNCTION] cardinality(anyarray) => integer
- [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
- [FUNCTION] element(anyarray) => anyelement
- [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2)
- [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2
- [SYNTAX] $1 IS A SET --> multiset_is_a_set($1)
- [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 -->
multiset_union($1, $2, all?)
- [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 -->
multiset_intersect($1, $2, all?)
- [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 -->
multiset_except($1, $2, all?)
- [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg()
- [AGGREGATE] fusion(anyarray) => anyarray
- [AGGREGATE] intersection(anyarray) => anyarray
--
Itagaki Takahiro