Function array_agg(array)
Greetings,
While looking for easier items in PostgreSQL Wiki's Todo List (this will be
my 3rd patch), i found this TODO:
Add a built-in array_agg(anyarray) or similar, that can aggregate
1-dimensional arrays into a 2-dimensional array.
I've stumbled by this lack of array_agg(anyarray) sometimes ago in my work,
so i decided to explore this.
Currently, if we try array_agg(anyarray), PostgreSQL behaves like this:
# select array_agg('{1,2}'::int[]);
ERROR: could not find array type for data type integer[]
Reading implementation of array_agg, it looks like the array_agg function
is generic, and can process any input. The error comes from PostgreSQL not
finding array type for int[] (_int4 in pg_proc).
In PostgreSQL, any array is multidimensional, array type for any array is
the same:
- the type of {1,2} is int[]
- {{1,2}, {3,4}} is int[]
- {{{1},{2}, {3} ,{4}}} is still int[]
So, can't we just set the typarray of array types to its self oid? (patch
attached). So far:
- the array_agg is working and returning correct types:
backend> select array_agg('{1,2}'::int[]);
1: array_agg (typeid = 1007, len = -1, typmod = -1, byval = f)
----
1: array_agg = "{"{1,2}"}" (typeid = 1007, len = -1, typmod = -1,
byval = f)
----
select array_agg('{''a'',''b''}'::varchar[]);
1: array_agg (typeid = 1015, len = -1, typmod = -1, byval = f)
----
1: array_agg = "{"{'a','b'}"}" (typeid = 1015, len = -1, typmod =
-1, byval = f)
----
- Regression tests passed except for the pg_type sanity check while
checking typelem relation with typarray:
SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
p2.typelem, p2.typlen
FROM pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid)
WHERE p1.typarray <> 0 AND
(p2.oid IS NULL OR p2.typelem <> p1.oid OR p2.typlen <> -1);
! oid | basetype | arraytype | typelem | typlen
! ------+----------------+----------------+---------+--------
! 143 | _xml | _xml | 142 | -1
! 199 | _json | _json | 114 | -1
! 629 | _line | _line | 628 | -1
! 719 | _circle | _circle | 718 | -1
... (cut)
Aside from the sanity check complaints, I don't see any problems in the
resulting array operations.
So, back to the question: Can't we just set the typarray of array types to
its self oid?
Regards,
--
Ali Akbar
Attachments:
array_agg_anyarray-1.patchtext/x-diff; charset=US-ASCII; name=array_agg_anyarray-1.patchDownload+68-68
Ali Akbar <the.apaan@gmail.com> writes:
So, can't we just set the typarray of array types to its self oid?
Seems dangerous as heck; certainly it would have side-effects far more
wide-ranging than just making this particular function work.
A safer answer is to split array_agg into two functions,
array_agg(anynonarray) -> anyarray
array_agg(anyarray) -> anyarray
I rather imagine you should do that anyway, because I really doubt
that this hack is operating quite as intended. I suspect you are
producing arrays containing arrays as elements, not true 2-D arrays.
That's not a direction we want to go in I think; certainly there are
no other operations that produce such things.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-10-11 22:28 GMT+07:00 Tom Lane <tgl@sss.pgh.pa.us>:
Seems dangerous as heck; certainly it would have side-effects far more
wide-ranging than just making this particular function work.A safer answer is to split array_agg into two functions,
array_agg(anynonarray) -> anyarray
array_agg(anyarray) -> anyarrayI rather imagine you should do that anyway, because I really doubt
that this hack is operating quite as intended. I suspect you are
producing arrays containing arrays as elements, not true 2-D arrays.
That's not a direction we want to go in I think; certainly there are
no other operations that produce such things.
Thanks for the review. Yes, it looks like the patch produced array as the
elements. So, all array operations behaves wierdly.
In this quick & dirty patch, I am trying to implement the
array_agg(anyarray), introducing two new functions:
- array_agg_anyarray_transfn
- array_agg_anyarray_finalfn
At first, i want to use accumArrayResult and makeMdArrayResult, but it's
complicated to work with multi-dimensional arrays with those two functions.
So i combined array_cat with those function.
Currently, it cannot handle NULL arrays:
backend> select array_agg(a) from (values(null::int[])) a(a);
1: array_agg (typeid = 1007, len = -1, typmod = -1, byval = f)
----
ERROR: cannot aggregate null arrays
Regards,
--
Ali Akbar
Attachments:
array_agg_anyarray-2.patchtext/x-diff; charset=US-ASCII; name=array_agg_anyarray-2.patchDownload+284-12
2014-10-12 19:37 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
Currently, it cannot handle NULL arrays:
backend> select array_agg(a) from (values(null::int[])) a(a);
1: array_agg (typeid = 1007, len = -1, typmod = -1, byval = f)
----
ERROR: cannot aggregate null arrays
While thinking about the function behavior if its input is NULL array (e.g:
NULL:int[]), i've found:
- currentpatch doesn't handle empty array correctly:
- when there is only one array to aggregate, the resulting array is
wrong
- when the first array is empty array, and the second array is also
empty array, it segfaulted
- if we see NULL array as NULL values, the resulting array cannot be
differentiated from array of null ints:
- SELECT array_agg(NULL::int[]) FROM generate_series(1,2); ---> {NULL,
NULL} with type int[]
- SELECT array_agg(NULL::int) FROM generate_series(1,2); --> {NULL,
NULL} with type int[]
Also i've found that handling NULL array is listed as BUG in TODO. The
discussion in the thread is still not finished, with last email from Tom
Lane (/messages/by-id/18866.1226025853@sss.pgh.pa.us):
array_lower raise exception if array is empty (there are no dimensions
to inquire about)
array_upper raise exception if array is empty (there are no dimensions
to inquire about)Well, these beg the question: is an empty array zero-dimensional, or
is it a one-dimensional array of no elements, or perhaps both of those
as well as higher-dimensional cases where any axis has zero elements,
or ???It's really all kind of messy ... we need to trade off simplicity of
definition, ease of use, backwards compatibility, and standards
compliance (though the standard has only 1-D arrays so it's of just
limited help here).
So, is there any idea how we will handle NULL and empty array in
array_agg(anyarray)?
I propose we just reject those input because the output will make no sense:
- array_agg(NULL::int[]) --> the result will be indistinguished from
array_agg of NULL ints.
- array_agg('{}'::int[]) --> how we determine the dimension of the result?
is it 0? Or the result will be just an empty array {} ?
Regards,
--
Ali Akbar
So, is there any idea how we will handle NULL and empty array in
array_agg(anyarray)?
I propose we just reject those input because the output will make no sense:
- array_agg(NULL::int[]) --> the result will be indistinguished from
array_agg of NULL ints.
- array_agg('{}'::int[]) --> how we determine the dimension of the result?
is it 0? Or the result will be just an empty array {} ?
This updated patch rejects NULL and {} arrays as noted above.
Regards,
--
Ali Akbar
Attachments:
array_agg_anyarray-3.patchtext/x-patch; charset=US-ASCII; name=array_agg_anyarray-3.patchDownload+291-12
Hi
2014-10-19 8:02 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
So, is there any idea how we will handle NULL and empty array in
array_agg(anyarray)?
I propose we just reject those input because the output will make no
sense:
- array_agg(NULL::int[]) --> the result will be indistinguished from
array_agg of NULL ints.
- array_agg('{}'::int[]) --> how we determine the dimension of the
result? is it 0? Or the result will be just an empty array {} ?This updated patch rejects NULL and {} arrays as noted above.
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append it.
2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.
3. array_agg was consistent with array(subselect), so it should be fixed too
postgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)
postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]
4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+ astate->aitems = 64 * nitems;
+ astate->nullbitmap = (bits8 *)
+ repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);
Regards
Pavel
Show quoted text
Regards,
--
Ali Akbar--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the review
2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test
2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?
Actually array_agg(anyarray) can be implemented by using accumArrayResult
and makeMdArrayResult, but in the process we will deconstruct the array
data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we
will reconstruct it through makeMdArrayResult. I think this way it's not
efficient, so i decided to reimplement it and memcpy the array data and
null flags as-is.
In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.
3. array_agg was consistent with array(subselect), so it should be fixed
toopostgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]
I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?
4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *) + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);
just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */
it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.
Regards,
--
Ali Akbar
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review
2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test
2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?Actually array_agg(anyarray) can be implemented by using accumArrayResult
and makeMdArrayResult, but in the process we will deconstruct the array
data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we
will reconstruct it through makeMdArrayResult. I think this way it's not
efficient, so i decided to reimplement it and memcpy the array data and
null flags as-is.
aha, so isn't better to fix a performance for accumArrayResult ?
In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.3. array_agg was consistent with array(subselect), so it should be fixed
toopostgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?
where you can start?
postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116
look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keyword
4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *) + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.
you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.
You take this value from accumArrayResult, but it is targeted for shorted
scalars - you should to expect so any array will be much larger.
Show quoted text
Regards,
--
Ali Akbar
2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review
2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test
2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?Actually array_agg(anyarray) can be implemented by using accumArrayResult
and makeMdArrayResult, but in the process we will deconstruct the array
data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we
will reconstruct it through makeMdArrayResult. I think this way it's not
efficient, so i decided to reimplement it and memcpy the array data and
null flags as-is.aha, so isn't better to fix a performance for accumArrayResult ?
Ok, i'll go this route. While reading the code of array(subselect) as you
pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.
In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.3. array_agg was consistent with array(subselect), so it should be fixed
toopostgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?where you can start?
postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keyword
Found it. Thanks.
On a side note in postgres array type for integer[] is still integer[], but
in pg_type, integer[] has no array type. I propose we consider to change it
so array type of anyarray is itself (not in this patch, of course, because
it is a big change). Consider the following code in nodeFuncs.c:109
if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}
to implement array(subselect) you pointed above, we must specially checks
for array types. Quick search on get_array_type returns 10 places.
I'll think more about this later. For this patch, i'll go without changes
in pg_type.h.
4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *) + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.You take this value from accumArrayResult, but it is targeted for shorted
scalars - you should to expect so any array will be much larger.
Ok.
Regards,
--
Ali Akbar
2014-10-23 4:00 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review
2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test
2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?Actually array_agg(anyarray) can be implemented by using
accumArrayResult and makeMdArrayResult, but in the process we will
deconstruct the array data and null bit-flags into ArrayBuildState->dvalues
and dnulls. And we will reconstruct it through makeMdArrayResult. I think
this way it's not efficient, so i decided to reimplement it and memcpy the
array data and null flags as-is.aha, so isn't better to fix a performance for accumArrayResult ?
Ok, i'll go this route. While reading the code of array(subselect) as you
pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.3. array_agg was consistent with array(subselect), so it should be
fixed toopostgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?where you can start?
postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keywordFound it. Thanks.
On a side note in postgres array type for integer[] is still integer[],
but in pg_type, integer[] has no array type. I propose we consider to
change it so array type of anyarray is itself (not in this patch, of
course, because it is a big change). Consider the following code in
nodeFuncs.c:109
yes, it is true - this is really big change and maybe needs separate
discuss - ***if we allow cycle there. I am not sure about possible side
effects***.
Maybe this change is not necessary, you can fix a check only ... "if type
is not array or if get_array_type is null raise a error"
if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}to implement array(subselect) you pointed above, we must specially checks
for array types. Quick search on get_array_type returns 10 places.
attention: probably we don't would to allow arrays everywhere.
Show quoted text
I'll think more about this later. For this patch, i'll go without changes
in pg_type.h.4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *) + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);just follow the arbitrary size choosen in accumArrayResult
(arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.You take this value from accumArrayResult, but it is targeted for shorted
scalars - you should to expect so any array will be much larger.Ok.
Regards,
--
Ali Akbar
Updated patch attached.
2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append
it.
i've added some regression tests in arrays.sql and aggregate.sql.
I've only found the documentation of array_agg. Where is the doc for
array(subselect) defined?
2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?Actually array_agg(anyarray) can be implemented by using
accumArrayResult and makeMdArrayResult, but in the process we will
deconstruct the array data and null bit-flags into ArrayBuildState->dvalues
and dnulls. And we will reconstruct it through makeMdArrayResult. I think
this way it's not efficient, so i decided to reimplement it and memcpy the
array data and null flags as-is.aha, so isn't better to fix a performance for accumArrayResult ?
Ok, i'll go this route. While reading the code of array(subselect) as you
pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.
implemented it by modifying ArrayBuildState to ArrayBuildStateArray and
ArrayBuildStateScalar (do you have any idea for better struct naming?)
In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.3. array_agg was consistent with array(subselect), so it should be
fixed toopostgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?where you can start?
postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keywordattention: probably we don't would to allow arrays everywhere.
I've changed the places where i think it's appropriate.
4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *) + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);just follow the arbitrary size choosen in accumArrayResult
(arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.You take this value from accumArrayResult, but it is targeted for
shorted scalars - you should to expect so any array will be much larger.
this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If
it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 *
number of items in array.
Regards,
--
Ali Akbar
Attachments:
array_agg_anyarray-4.patchtext/x-patch; charset=US-ASCII; name=array_agg_anyarray-4.patchDownload+191-93
Hi
it looks well
doc:
http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
it should be fixed too
Regards
Pavel
2014-10-24 10:24 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Show quoted text
Updated patch attached.
2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:
1. patch doesn't hold documentation and regress tests, please append
it.i've added some regression tests in arrays.sql and aggregate.sql.
I've only found the documentation of array_agg. Where is the doc for
array(subselect) defined?2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and
makeArrayResult/makeMdArrayResult, haven't we?Actually array_agg(anyarray) can be implemented by using
accumArrayResult and makeMdArrayResult, but in the process we will
deconstruct the array data and null bit-flags into ArrayBuildState->dvalues
and dnulls. And we will reconstruct it through makeMdArrayResult. I think
this way it's not efficient, so i decided to reimplement it and memcpy the
array data and null flags as-is.aha, so isn't better to fix a performance for accumArrayResult ?
Ok, i'll go this route. While reading the code of array(subselect) as
you pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.implemented it by modifying ArrayBuildState to ArrayBuildStateArray and
ArrayBuildStateScalar (do you have any idea for better struct naming?)In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.3. array_agg was consistent with array(subselect), so it should be
fixed toopostgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?where you can start?
postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keywordattention: probably we don't would to allow arrays everywhere.
I've changed the places where i think it's appropriate.
4. why you use a magic constant (64) there?
+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *) + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);just follow the arbitrary size choosen in accumArrayResult
(arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.You take this value from accumArrayResult, but it is targeted for
shorted scalars - you should to expect so any array will be much larger.this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If
it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 *
number of items in array.Regards,
--
Ali Akbar
2014-10-24 15:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
it looks well
doc:
http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
it should be fixed tooRegards
Pavel
doc updated with additional example for array(subselect). patch attached.
Regards,
--
Ali Akbar
Attachments:
array_agg_anyarray-5.patchtext/x-patch; charset=US-ASCII; name=array_agg_anyarray-5.patchDownload+196-93
Hi
some in last patch is wrong, I cannot to compile it:
arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen = 64; /* arbitrary starting array size */
^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
astate->nelems = 0;
^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen *= 2;
2014-10-24 11:24 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Show quoted text
2014-10-24 15:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
it looks well
doc:
http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
it should be fixed tooRegards
Pavel
doc updated with additional example for array(subselect). patch attached.
Regards,
--
Ali Akbar
2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
some in last patch is wrong, I cannot to compile it:
arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen = 64; /* arbitrary starting array size */
^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
astate->nelems = 0;
^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen *= 2;
Sorry, correct patch attached.
This patch is in patience format (git --patience ..). In previous patches,
i use context format (git --patience ... | filterdiff --format=context),
but it turns out that some modification is lost.
--
Ali Akbar
Attachments:
array_agg_anyarray-6.patchtext/x-patch; charset=US-ASCII; name=array_agg_anyarray-6.patchDownload+485-78
On Fri, Oct 24, 2014 at 11:43 AM, Ali Akbar <the.apaan@gmail.com> wrote:
2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
some in last patch is wrong, I cannot to compile it:
arrayfuncs.c: In function 'accumArrayResult':
arrayfuncs.c:4603:9: error: 'ArrayBuildState' has no member named 'alen'
astate->alen = 64; /* arbitrary starting array size */
^
arrayfuncs.c:4604:9: error: 'ArrayBuildState' has no member named
'dvalues'
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4604:44: error: 'ArrayBuildState' has no member named 'alen'
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4605:9: error: 'ArrayBuildState' has no member named 'dnulls'
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4605:42: error: 'ArrayBuildState' has no member named 'alen'
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4606:9: error: 'ArrayBuildState' has no member named 'nelems'
astate->nelems = 0;
^
arrayfuncs.c:4618:13: error: 'ArrayBuildState' has no member named
'nelems'
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4618:31: error: 'ArrayBuildState' has no member named 'alen'
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4620:10: error: 'ArrayBuildState' has no member named 'alen'
astate->alen *= 2;Sorry, correct patch attached.
This patch is in patience format (git --patience ..). In previous patches,
i use context format (git --patience ... | filterdiff --format=context),
but it turns out that some modification is lost.
That's not surprising, sometimes filterdiff misses the shot.
--
Michael
2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
some in last patch is wrong, I cannot to compile it:
arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen = 64; /* arbitrary starting array size */
^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named
‘dvalues’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
astate->nelems = 0;
^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named
‘nelems’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen *= 2;Sorry, correct patch attached.
This patch is in patience format (git --patience ..). In previous patches,
i use context format (git --patience ... | filterdiff --format=context),
but it turns out that some modification is lost.
last version is compileable, but some is still broken
postgres=# select array_agg(array[i, i+1, i-1])
from generate_series(1,2) a(i);
ERROR: could not find array type for data type integer[]
but array(subselect) works
postgres=# select array(select a from xx);
array
-------------------
{{1,2,3},{1,2,3}}
(1 row)
Regards
Pavel
Show quoted text
--
Ali Akbar
Hi
I did some performance tests and it is interesting:
it is about 15% faster than original implementation.
Regards
Pavel
2014-10-24 13:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Show quoted text
2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
some in last patch is wrong, I cannot to compile it:
arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen = 64; /* arbitrary starting array size */
^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named
‘dvalues’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named
‘dnulls’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named
‘nelems’
astate->nelems = 0;
^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named
‘nelems’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen *= 2;Sorry, correct patch attached.
This patch is in patience format (git --patience ..). In previous
patches, i use context format (git --patience ... | filterdiff
--format=context), but it turns out that some modification is lost.last version is compileable, but some is still broken
postgres=# select array_agg(array[i, i+1, i-1])
from generate_series(1,2) a(i);
ERROR: could not find array type for data type integer[]but array(subselect) works
postgres=# select array(select a from xx);
array
-------------------
{{1,2,3},{1,2,3}}
(1 row)Regards
Pavel
--
Ali Akbar
Michael Paquier wrote:
That's not surprising, sometimes filterdiff misses the shot.
Really? Wow, that's bad news. I've been using it to submit patches
from time to time ...
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-10-24 13:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
some in last patch is wrong, I cannot to compile it:
arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen = 64; /* arbitrary starting array size */
^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named
‘dvalues’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named
‘dnulls’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named
‘nelems’
astate->nelems = 0;
^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named
‘nelems’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
if (astate->nelems >= astate->alen)
^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
astate->alen *= 2;Sorry, correct patch attached.
This patch is in patience format (git --patience ..). In previous
patches, i use context format (git --patience ... | filterdiff
--format=context), but it turns out that some modification is lost.last version is compileable, but some is still broken
postgres=# select array_agg(array[i, i+1, i-1])
from generate_series(1,2) a(i);
ERROR: could not find array type for data type integer[]
I am sorry, it works - I had a problem with broken database
I fixed small issue in regress tests and I enhanced tests for varlena types
and null values.
Regards
Pavel
Show quoted text
but array(subselect) works
postgres=# select array(select a from xx);
array
-------------------
{{1,2,3},{1,2,3}}
(1 row)Regards
Pavel
--
Ali Akbar