multiset patch review
Patching:
patching file doc/src/sgml/func.sgml
Hunk #6 succeeded at 10567 (offset 1 line).
Hunk #7 succeeded at 10621 (offset 1 line).
Hunk #8 succeeded at 10721 (offset 1 line).
Hunk #9 succeeded at 10775 (offset 1 line).
patching file src/backend/nodes/makefuncs.c
patching file src/backend/parser/gram.y
patching file src/backend/utils/adt/array_userfuncs.c
patching file src/include/catalog/pg_aggregate.h
patching file src/include/catalog/pg_proc.h
patching file src/include/nodes/makefuncs.h
patching file src/include/parser/kwlist.h
patching file src/include/utils/array.h
Hunk #1 succeeded at 281 (offset 1 line).
patching file src/test/regress/expected/arrays.out
Hunk #1 succeeded at 1558 (offset 272 lines).
patching file src/test/regress/sql/arrays.sql
Hunk #1 succeeded at 438 (offset 12 lines).
Compilation:
there are no warning related to patch
regress tests failed
*** 1639,1647 ****
..
SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],
ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
! submultiset_of | submultiset_of ^M
! ----------------+----------------^M
! | f^M
(1 row)
..
SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];
--- 1639,1647 ----
..
SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],
ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
! submultiset_of | submultiset_of.
! ----------------+----------------
! | f
(1 row)
..
SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];
There is often used a fragment
+ <----><------>fn.arg[0] = values1[n1];
+ <----><------>fn.arg[1] = values2[n2];
+ <----><------>fn.argnull[0] = false;
+ <----><------>fn.argnull[1] = false;
+ <----><------>fn.isnull = false;
+ <----><------>r = DatumGetInt32(FunctionCallInvoke(&fn));
it can be moved to procedure?
Doc and regress tests are ok.
I see only one issue. There isn't documented, what is a MULTISET?
Regards
Pavel Stehule
Thank you for the review.
On Mon, Jan 10, 2011 at 04:13, Pavel Stehule <pavel.stehule@gmail.com> wrote:
regress tests failed
Fixed.
There is often used a fragment + <----><------>fn.arg[0] = values1[n1]; + <----><------>fn.arg[1] = values2[n2]; + <----><------>fn.argnull[0] = false; + <----><------>fn.argnull[1] = false; + <----><------>fn.isnull = false; + <----><------>r = DatumGetInt32(FunctionCallInvoke(&fn)); it can be moved to procedure?
Agreed. I use FunctionCall2() instead of the fragments.
I see only one issue. There isn't documented, what is a MULTISET?
I added a short description about MULTISET and example of operators
in "Arrays > 8.14.7. Multiset Support" section in the docs.
Is it enough? or what kind of information do you want?
Separate patches for src and doc attached. It includes a few bug fixes
and cleanup. I changed the error code in trim_array() to
ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.
--
Itagaki Takahiro
On ons, 2011-01-12 at 13:52 +0900, Itagaki Takahiro wrote:
I added a short description about MULTISET and example of operators
in "Arrays > 8.14.7. Multiset Support" section in the docs.
Is it enough? or what kind of information do you want?Separate patches for src and doc attached. It includes a few bug fixes
and cleanup. I changed the error code in trim_array() to
ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.
You may want to read this thread about the cardinality function are you
trying to add:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg01388.php
Also, what happened to the idea of a separate MULTISET type?
On Wed, Jan 12, 2011 at 15:21, Peter Eisentraut <peter_e@gmx.net> wrote:
You may want to read this thread about the cardinality function are you
trying to add:http://archives.postgresql.org/pgsql-hackers/2009-02/msg01388.php
Since our archive is split per month, this might be more readable:
http://postgresql.1045698.n5.nabble.com/cardinality-td2003172.html
We've discussed what number should cardinality() returns:
#1. The total number of elements. (It's currently implemented.)
#2. The length of the first dimension.
It's as same as array_length(array, 1) .
I prefer #1 because we have no easy way to retrieve the number.
array_dims() returns similar numbers, but calculate the total
number is a bit complex.
If we will support array of arrays (jugged array), cardinality()
can return the number of elements in the most outer array.
It's similar definition in multi-dimensional arrays in C#,
that has both array of arrays and multi-dimensional arrays.
http://msdn.microsoft.com/library/system.array.length(v=VS.100).aspx
We can compare those SQL functions and C# array methods:
* cardinality(array) <--> array.Length
* array_length(array. dim) <--> array.GetLength(dim)
Also, what happened to the idea of a separate MULTISET type?
I don't have any plans to implement dedicated MULTISET type for now
because almost all functions and operators can work also for arrays.
If we have a true MULTISET data type, we can overload them with
MULTISET arguments.
One exception might be collect() aggregate function because
we might need to change the result type from array to multiset.
collect(anyelement) => anyarray for now
Note that fusion() won't be an issue because we can overload it:
fusion(anyarray) => anyarray and (anymultiset) => anymultiset
--
Itagaki Takahiro
Hello
there is one issue - probably useless checking a type equality in
function check_comparable and check_concatinatable, because when your
function is registrated with arguments (anyarray, anyarray), then is
guaranteed so type of array1 is same as type of array2, and then you
don't need to check.
Regards
Pavel Stehule
2011/1/12 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
Show quoted text
Thank you for the review.
On Mon, Jan 10, 2011 at 04:13, Pavel Stehule <pavel.stehule@gmail.com> wrote:
regress tests failed
Fixed.
There is often used a fragment + <----><------>fn.arg[0] = values1[n1]; + <----><------>fn.arg[1] = values2[n2]; + <----><------>fn.argnull[0] = false; + <----><------>fn.argnull[1] = false; + <----><------>fn.isnull = false; + <----><------>r = DatumGetInt32(FunctionCallInvoke(&fn)); it can be moved to procedure?Agreed. I use FunctionCall2() instead of the fragments.
I see only one issue. There isn't documented, what is a MULTISET?
I added a short description about MULTISET and example of operators
in "Arrays > 8.14.7. Multiset Support" section in the docs.
Is it enough? or what kind of information do you want?Separate patches for src and doc attached. It includes a few bug fixes
and cleanup. I changed the error code in trim_array() to
ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.--
Itagaki Takahiro
On Wed, Jan 12, 2011 at 20:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
there is one issue - probably useless checking a type equality in
function check_comparable and check_concatinatable, because when your
function is registrated with arguments (anyarray, anyarray), then is
guaranteed so type of array1 is same as type of array2, and then you
don't need to check.
It's true for almost all cases, but we have "anyarray" columns in
pg_statistic.stavaluesN. When we pass them to those array functions,
element types of two anyarrays could be different.
I guess they are protections only for them.
=# SELECT A.stavalues1 SUBMULTISET OF B.stavalues1
FROM pg_statistic A, pg_statistic B
WHERE A.stakind1 = 2 AND B.stakind1 = 2;
ERROR: cannot compare incompatible arrays
DETAIL: Arrays with element types name and oid[] are not compatible
for comparison.
--
Itagaki Takahiro
2011/1/12 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Wed, Jan 12, 2011 at 20:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
there is one issue - probably useless checking a type equality in
function check_comparable and check_concatinatable, because when your
function is registrated with arguments (anyarray, anyarray), then is
guaranteed so type of array1 is same as type of array2, and then you
don't need to check.It's true for almost all cases, but we have "anyarray" columns in
pg_statistic.stavaluesN. When we pass them to those array functions,
element types of two anyarrays could be different.
I guess they are protections only for them.=# SELECT A.stavalues1 SUBMULTISET OF B.stavalues1
FROM pg_statistic A, pg_statistic B
WHERE A.stakind1 = 2 AND B.stakind1 = 2;
ERROR: cannot compare incompatible arrays
DETAIL: Arrays with element types name and oid[] are not compatible
for comparison.
ook
Pavel
Show quoted text
--
Itagaki Takahiro
2011/1/12 Pavel Stehule <pavel.stehule@gmail.com>:
2011/1/12 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Wed, Jan 12, 2011 at 20:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
there is one issue - probably useless checking a type equality in
function check_comparable and check_concatinatable, because when your
function is registrated with arguments (anyarray, anyarray), then is
guaranteed so type of array1 is same as type of array2, and then you
don't need to check.It's true for almost all cases, but we have "anyarray" columns in
pg_statistic.stavaluesN. When we pass them to those array functions,
element types of two anyarrays could be different.
I guess they are protections only for them.=# SELECT A.stavalues1 SUBMULTISET OF B.stavalues1
FROM pg_statistic A, pg_statistic B
WHERE A.stakind1 = 2 AND B.stakind1 = 2;
ERROR: cannot compare incompatible arrays
DETAIL: Arrays with element types name and oid[] are not compatible
for comparison.ook
so I think it is ready for commit
Regards
Pavel Stehule
Show quoted text
Pavel
--
Itagaki Takahiro
Excerpts from Itagaki Takahiro's message of mié ene 12 01:52:12 -0300 2011:
Separate patches for src and doc attached. It includes a few bug fixes
and cleanup. I changed the error code in trim_array() to
ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.
Two small nitpicks:
+ static void
+ check_concatinatable(Oid element_type1, Oid element_type2)
+ {
+ if (element_type1 != element_type2)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("cannot concatenate incompatible arrays"),
+ errdetail("Arrays with element types %s and %s are not "
+ "compatible for concatenation.",
+ format_type_be(element_type1),
+ format_type_be(element_type2))));
+ }
I think the word is either "concatenable" or "concatenatable". Also
please don't break up the string in errdetail() even if it's longer than
80 chars. (The function below this one has this too)
I didn't read the patch in much more detail.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Jan 12, 2011 at 23:29, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Two small nitpicks: + check_concatinatable(Oid element_type1, Oid element_type2) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot concatenate incompatible arrays"), + errdetail("Arrays with element types %s and %s are not " + "compatible for concatenation.",I think the word is either "concatenable" or "concatenatable". Also
please don't break up the string in errdetail() even if it's longer than
80 chars. (The function below this one has this too)
OK, I'll fix them,
but the broken up messages come from the existing code.
--
Itagaki Takahiro
On Thu, Jan 13, 2011 at 10:40, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
I think the word is either "concatenable" or "concatenatable". Also
please don't break up the string in errdetail() even if it's longer than
80 chars. (The function below this one has this too)OK, I'll fix them,
but the broken up messages come from the existing code.
The attached is a fixed version.
BTW, should we use an "operator" to represent SUBMULTISET OF ?
It is mapped to submultiset_of "function" for now. If GIN and GiST
indexes will support the predicate in the future in addition to <@,
SUBMULTISET OF should be mapped to an operator rather than a function.
We need to choose different operator from <@ and @> for the case
because the semantics are not the same. (ex. <& and &>)
Note that MEMBER OF is represented as "ANY =".
--
Itagaki Takahiro
Attachments:
multiset-20110118.patchapplication/octet-stream; name=multiset-20110118.patchDownload+1603-38
On Tue, Jan 18, 2011 at 17:39, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
BTW, should we use an "operator" to represent SUBMULTISET OF ?
I did it in the attached patch. Also, I fixed a bug of NULL checks
in SUBMULTISET OF operator.
Now SUBMULTISET OF is an alias to the new <& operator. We also have
&> operator as the commutator. They are different from <@ and @>
operators because they considers the number of elements.
For example:
=# SELECT ARRAY[1,1] <@ ARRAY[1], ARRAY[1,1] <& ARRAY[1];
?column? | ?column?
----------+----------
t | f
(1 row)
GIN still doesn't support <& and &> operators because of NULL handling.
In the spec, all values including NULLs should be returned for an empty
array key (i.e, "WHERE ARRAY[] SUBMULTISET OF array_col" returns everything),
but the current GIN implementation won't return NULL values for non-NULL keys.
Since it requires changes in GIN, I'd like to postpone gin support to the next
development cycle for 9.2.
--
Itagaki Takahiro
Attachments:
multiset-20110124.patchapplication/octet-stream; name=multiset-20110124.patchDownload+1647-38
On Mon, Jan 24, 2011 at 2:45 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
[ latest patch ]
I notice that this is adding keywords and syntax support for what is
basically a PostgreSQL extension (since we certainly can't possibly be
following the SQL standards given that we're not implementing a new
datatype. Is that really a good idea?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jan 24, 2011 at 20:49, Robert Haas <robertmhaas@gmail.com> wrote:
I notice that this is adding keywords and syntax support for what is
basically a PostgreSQL extension (since we certainly can't possibly be
following the SQL standards given that we're not implementing a new
datatype. Is that really a good idea?
As I wrote here,
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
I think we can follow the SQL standard incrementally because we
have function overloads.
One exception is the result type of collect() aggregate function.
It returns an array for now, but will return a multiset when we
support true multiset data type.
--
Itagaki Takahiro
On Mon, Jan 24, 2011 at 7:27 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
On Mon, Jan 24, 2011 at 20:49, Robert Haas <robertmhaas@gmail.com> wrote:
I notice that this is adding keywords and syntax support for what is
basically a PostgreSQL extension (since we certainly can't possibly be
following the SQL standards given that we're not implementing a new
datatype. Is that really a good idea?As I wrote here,
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
I think we can follow the SQL standard incrementally because we
have function overloads.One exception is the result type of collect() aggregate function.
It returns an array for now, but will return a multiset when we
support true multiset data type.
So, the plan is to add this now with non-standard semantics and then
change the semantics later if and when we implement what the standard
requires? That's not something we usually do, and I don't see why
it's a better idea in this case than it is in general. It's OK to
have non-standard behavior with non-standard syntax, but I think
non-standard behavior with standard syntax is something we want to try
hard to avoid.
I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
2011/1/30 Robert Haas <robertmhaas@gmail.com>:
On Mon, Jan 24, 2011 at 7:27 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:On Mon, Jan 24, 2011 at 20:49, Robert Haas <robertmhaas@gmail.com> wrote:
I notice that this is adding keywords and syntax support for what is
basically a PostgreSQL extension (since we certainly can't possibly be
following the SQL standards given that we're not implementing a new
datatype. Is that really a good idea?As I wrote here,
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
I think we can follow the SQL standard incrementally because we
have function overloads.One exception is the result type of collect() aggregate function.
It returns an array for now, but will return a multiset when we
support true multiset data type.So, the plan is to add this now with non-standard semantics and then
change the semantics later if and when we implement what the standard
requires? That's not something we usually do, and I don't see why
it's a better idea in this case than it is in general. It's OK to
have non-standard behavior with non-standard syntax, but I think
non-standard behavior with standard syntax is something we want to try
hard to avoid.I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.
Hello
It must not be a significant problem with compatibility, because
implemented operators and functions are implemented for arrays.
Functions from this patch are very useful - there are lot of
implementations in SQL language, and this implementation means a
significant speed. I can't to believe so there can be situation, when
pg will has a true support of collection and operations with arrays
will not offer similar functionality. I propose a remove collect()
aggregate, but all others functions and operators can stay.
And if this isn't acceptable for Robert, then I like implementation
of these functions without parser's changes as minimum. Function like
array_sort, array_distinct and some variants array_union are really
missing (should be in core).
Regards
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
So, the plan is to add this now with non-standard semantics and then
change the semantics later if and when we implement what the standard
requires? That's not something we usually do, and I don't see why
it's a better idea in this case than it is in general. It's OK to
have non-standard behavior with non-standard syntax, but I think
non-standard behavior with standard syntax is something we want to try
hard to avoid.
I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.
+1 ... if we're going to provide nonstandard behavior, it should be with
a different syntax. Also, with a contrib module we could keep on
providing the nonstandard behavior for people who still need it, even
after implementing the standard properly.
regards, tom lane
On Sun, Jan 30, 2011 at 12:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
So, the plan is to add this now with non-standard semantics and then
change the semantics later if and when we implement what the standard
requires? That's not something we usually do, and I don't see why
it's a better idea in this case than it is in general. It's OK to
have non-standard behavior with non-standard syntax, but I think
non-standard behavior with standard syntax is something we want to try
hard to avoid.I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.+1 ... if we're going to provide nonstandard behavior, it should be with
a different syntax. Also, with a contrib module we could keep on
providing the nonstandard behavior for people who still need it, even
after implementing the standard properly.
Good point.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jan 31, 2011 at 02:34, Robert Haas <robertmhaas@gmail.com> wrote:
I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.+1 ... if we're going to provide nonstandard behavior, it should be with
a different syntax. Also, with a contrib module we could keep on
providing the nonstandard behavior for people who still need it, even
after implementing the standard properly.Good point.
I agree for collect() function, that is the only function we cannot
provide compatibility when we have MULTISET. But others are still
reasonable because they won't provide nonstandard behavior.
The SQL standard seems to have abstract COLLECTION data type as a
super class of ARRAY and MULTISET. So, it's reasonable that
functions and operators that accept MULTISETs also accept ARRAYs.
For example, we will have cardinality(ARRAY) even if we have
cardinality(MULTISET). Also, trim_array() is in the SQL standard.
I can remove some parts in the patch, especially for parser changes,
but others should be still in the core.
--
Itagaki Takahiro
On Sun, Jan 30, 2011 at 1:46 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
On Mon, Jan 31, 2011 at 02:34, Robert Haas <robertmhaas@gmail.com> wrote:
I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.+1 ... if we're going to provide nonstandard behavior, it should be with
a different syntax. Also, with a contrib module we could keep on
providing the nonstandard behavior for people who still need it, even
after implementing the standard properly.Good point.
I agree for collect() function, that is the only function we cannot
provide compatibility when we have MULTISET. But others are still
reasonable because they won't provide nonstandard behavior.The SQL standard seems to have abstract COLLECTION data type as a
super class of ARRAY and MULTISET. So, it's reasonable that
functions and operators that accept MULTISETs also accept ARRAYs.
For example, we will have cardinality(ARRAY) even if we have
cardinality(MULTISET). Also, trim_array() is in the SQL standard.I can remove some parts in the patch, especially for parser changes,
but others should be still in the core.
Well, do you want to revise this and submit a stripped-down version?
I'm not averse to adding things that are required by the standard and
won't cause backward compatibility problems later.
The documentation for trim_array() in the current patch version is
pretty terrible. The documentation describes it
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company