proposal: FOREACH-IN-ARRAY (probably for 9.2?)
Hello
I am resending a redesigned proposal about special plpgsql statement
that support iteration over an array.
The most conflict issue of last proposal was a syntax. It enhanced
relative complex FOR statement. So now, it's based on new statement
with simple syntax. We can use a keyword FOREACH, this isn't in
conflict with PL/SQL - use a keyword FORALL and it isn't in conflict
with SQL/PSM too. More - this special statement can be used for
PostgreSQL's specific purposes. It can carry a new features in future.
The design of proposed functionality is simple, but respects a
possibility for enhancing a FOREACH cycle for future.
==proposed syntax:==
[ <<label>> ]
FOREACH var [, var [..]] IN ARRAY expr
LOOP
...
END LOOP [ label ]
==the goals:==
* cleaner syntax for full iteration over array
* reduce a overhead from only seq. access to any field in array
(it's not too significant)
* simplify iteration over multidimensional arrays
The most performance issue of access to a untoasted array is "solved"
with other patch.
== Iteration over multidimensional arrays ==
Its designed to reduce one dimension from source array. It can remove
a slicing and simplify code:
CREATE OR REPLACE FUNCTION public.fa(anyarray)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE i int[];
BEGIN
FOREACH i IN ARRAY $1
LOOP
RAISE NOTICE '%', i;
END LOOP;
END;
$function$
postgres=# select fa(array[[[1,2],[3,4]],[[1,2],[3,4]],[[5,6],[7,8]]]);
NOTICE: {{1,2},{3,4}}
NOTICE: {{1,2},{3,4}}
NOTICE: {{5,6},{7,8}}
fa
----
(1 row)
postgres=# select fa(array[[1,2,3,4],[1,2,3,4],[5,6,7,8]]);
NOTICE: {1,2,3,4}
NOTICE: {1,2,3,4}
NOTICE: {5,6,7,8}
fa
----
(1 row)
ideas, notes?
Regards
Pavel
Attachments:
foreach-in-array.difftext/x-patch; charset=US-ASCII; name=foreach-in-array.diffDownload+290-2
Excerpts from Pavel Stehule's message of jue dic 16 16:19:17 -0300 2010:
The most performance issue of access to a untoasted array is "solved"
with other patch.
Was the other patch applied?
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2010/12/17 Alvaro Herrera <alvherre@commandprompt.com>:
Excerpts from Pavel Stehule's message of jue dic 16 16:19:17 -0300 2010:
The most performance issue of access to a untoasted array is "solved"
with other patch.Was the other patch applied?
no, it's in queue for next commitfest
https://commitfest.postgresql.org/action/patch_view?id=440
Regards
Pavel
Show quoted text
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am resending a redesigned proposal about special plpgsql statement
that support iteration over an array.
OK ...
== Iteration over multidimensional arrays ==
Its designed to reduce one dimension from source array. It can remove
a slicing and simplify code:
This seems like a really bad, confusing idea. I think it should throw
a type-mismatch error in this case. If there is any use-case for such a
thing, which I'm quite unconvinced of, it ought to use a separate syntax
rather than overloading the element-by-element syntax.
regards, tom lane
On Fri, Dec 17, 2010 at 10:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am resending a redesigned proposal about special plpgsql statement
that support iteration over an array.OK ...
== Iteration over multidimensional arrays ==
Its designed to reduce one dimension from source array. It can remove
a slicing and simplify code:This seems like a really bad, confusing idea. I think it should throw
a type-mismatch error in this case. If there is any use-case for such a
thing, which I'm quite unconvinced of, it ought to use a separate syntax
rather than overloading the element-by-element syntax.
I don't agree at all -- iterating arrays by slice is a frequently
requested feature (you can kinda sorta do it by slice notation, but
arr[n] giving null is a -general FAQ. This is how people think arrays
should work. I suppose that having this functionality reserved in a
tiny corner of plpgsql is not so good, but I think foreach... would
become the preferred way to iterate arrays always.
merlin
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am resending a redesigned proposal about special plpgsql statement
that support iteration over an array.OK ...
== Iteration over multidimensional arrays ==
Its designed to reduce one dimension from source array. It can remove
a slicing and simplify code:This seems like a really bad, confusing idea. I think it should throw
a type-mismatch error in this case. If there is any use-case for such a
thing, which I'm quite unconvinced of, it ought to use a separate syntax
rather than overloading the element-by-element syntax.
Without this feature any iteration over 2d and more dimensional array
is not practical. If I have a 2D array, then I would to get a vector.
Access to individual values can be to limiting, because I need a more
cycles to get a complete vector. Usually I can use a array of row
instead a 2d array, but still and in feature there is problem with
iteration over row. So sometime is more practical to use a 2d array.
Actually It raise a type mismatch error, when a user used a scalar
variable and data is a vector (array)
Pavel
Show quoted text
regards, tom lane
Merlin Moncure <mmoncure@gmail.com> writes:
On Fri, Dec 17, 2010 at 10:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This seems like a really bad, confusing idea. �I think it should throw
a type-mismatch error in this case. �If there is any use-case for such a
thing, which I'm quite unconvinced of, it ought to use a separate syntax
rather than overloading the element-by-element syntax.
I don't agree at all -- iterating arrays by slice is a frequently
requested feature (you can kinda sorta do it by slice notation, but
arr[n] giving null is a -general FAQ. This is how people think arrays
should work. I suppose that having this functionality reserved in a
tiny corner of plpgsql is not so good, but I think foreach... would
become the preferred way to iterate arrays always.
Well, okay, if it's useful we can have it, but I still say it needs to
be a separate syntax. The example Pavel gives looks like nothing so
much as a beginner's error, ie putting [] on the target variable when
he shouldn't have.
Furthermore, it's underspecified: who's to say how many dimensions of
the array are supposed to get sliced off? There's no reasonable place
to extend this syntax to specify that. It will also be inconsistent
for "foreach scalar in array" to iterate element-by-element no matter
how many dimensions array has, while "foreach array in array" does
something different from that.
regards, tom lane
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Merlin Moncure <mmoncure@gmail.com> writes:
On Fri, Dec 17, 2010 at 10:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This seems like a really bad, confusing idea. I think it should throw
a type-mismatch error in this case. If there is any use-case for such a
thing, which I'm quite unconvinced of, it ought to use a separate syntax
rather than overloading the element-by-element syntax.I don't agree at all -- iterating arrays by slice is a frequently
requested feature (you can kinda sorta do it by slice notation, but
arr[n] giving null is a -general FAQ. This is how people think arrays
should work. I suppose that having this functionality reserved in a
tiny corner of plpgsql is not so good, but I think foreach... would
become the preferred way to iterate arrays always.Well, okay, if it's useful we can have it, but I still say it needs to
be a separate syntax. The example Pavel gives looks like nothing so
much as a beginner's error, ie putting [] on the target variable when
he shouldn't have.
Now the message is unclean - but it can be enhanced. We can a diagnose
situation when result is multidimensional array and target isn't
array, and the we can to throw user friendly message.
Furthermore, it's underspecified: who's to say how many dimensions of
the array are supposed to get sliced off? There's no reasonable place
to extend this syntax to specify that. It will also be inconsistent
for "foreach scalar in array" to iterate element-by-element no matter
how many dimensions array has, while "foreach array in array" does
something different from that.
it reduce just one dimension. Now I expect, and I think so it is
correct, so user knows a used dimension. Just doesn't know a data. So
user can to decide and fill correct type. The design strictly remove
any U.I. from design. So using a incorect type is bug.
Because a FOREACH syntax is new, we can to enhance it to possible direction:
FOREACH VALUE var IN ARRAY expr
LOOP
END LOOP
and then it will iterate per one field without a dimension reduction.
So this possibility is available and I think so could be implemented
too.
Pavel
Show quoted text
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Furthermore, it's underspecified: who's to say how many dimensions of
the array are supposed to get sliced off? There's no reasonable place
to extend this syntax to specify that. It will also be inconsistent
for "foreach scalar in array" to iterate element-by-element no matter
how many dimensions array has, while "foreach array in array" does
something different from that.
it reduce just one dimension. Now I expect, and I think so it is
correct, so user knows a used dimension. Just doesn't know a data. So
user can to decide and fill correct type. The design strictly remove
any U.I. from design. So using a incorect type is bug.
In other words, your proposal is error-prone to use, restricted in what
it can do, and incapable of being extended later without breaking
things. If there is some redeeming social value to set against those
problems, I'm not seeing it.
What I think we should have is
FOREACH scalar-variable IN ARRAY array-expression
which iterates element by element regardless of how many dimensions the
array has. Then there should be some other syntax for iterating over
slices, and we should give some thought to being able to specify how
"deep" the slice is. I can definitely think of use cases for pulling
off either 1 dimension at a time (so you get vectors) or N-1 dimensions
at a time, and it's not out of the realm of reason to want intermediate
cases.
Maybe
FOR_EACH scalar-variable IN ARRAY array-expression
FOR_SLICE array-variable [DEPTH n] IN ARRAY array-expression
Or I guess you could use the same leading keyword if you make the depth
specification mandatory for the slice case:
FOREACH scalar-variable IN ARRAY array-expression
FOREACH array-variable SLICE n IN ARRAY array-expression
That might be a better idea since it avoids the inevitable argument over
whether the default slice depth should be 1 dimension or N-1 dimensions.
regards, tom lane
On Fri, Dec 17, 2010 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Furthermore, it's underspecified: who's to say how many dimensions of
the array are supposed to get sliced off? There's no reasonable place
to extend this syntax to specify that. It will also be inconsistent
for "foreach scalar in array" to iterate element-by-element no matter
how many dimensions array has, while "foreach array in array" does
something different from that.it reduce just one dimension. Now I expect, and I think so it is
correct, so user knows a used dimension. Just doesn't know a data. So
user can to decide and fill correct type. The design strictly remove
any U.I. from design. So using a incorect type is bug.In other words, your proposal is error-prone to use, restricted in what
it can do, and incapable of being extended later without breaking
things. If there is some redeeming social value to set against those
problems, I'm not seeing it.What I think we should have is
FOREACH scalar-variable IN ARRAY array-expression
which iterates element by element regardless of how many dimensions the
array has. Then there should be some other syntax for iterating over
slices, and we should give some thought to being able to specify how
"deep" the slice is. I can definitely think of use cases for pulling
off either 1 dimension at a time (so you get vectors) or N-1 dimensions
at a time, and it's not out of the realm of reason to want intermediate
cases.Maybe
FOR_EACH scalar-variable IN ARRAY array-expression
FOR_SLICE array-variable [DEPTH n] IN ARRAY array-expression
Or I guess you could use the same leading keyword if you make the depth
specification mandatory for the slice case:FOREACH scalar-variable IN ARRAY array-expression
FOREACH array-variable SLICE n IN ARRAY array-expression
That might be a better idea since it avoids the inevitable argument over
whether the default slice depth should be 1 dimension or N-1 dimensions.
another way:
FOREACH scalar IN ARRAY arr_exp DIMS in dim_var
dim_var being int[], or possibly text, of length #dimensions, giving
per dimesion index.
I like this because it would fit well with alternate form of unnest,
should it ever be written:
create function unnest(anyarray, dims out int[], elem out anyelement)
returns setof...
SLICE notation is still good though, and it's probably faster since
you have less work to do in iteration step? It's certainly easier,
but very plpgsql specific.
merlin
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Furthermore, it's underspecified: who's to say how many dimensions of
the array are supposed to get sliced off? There's no reasonable place
to extend this syntax to specify that. It will also be inconsistent
for "foreach scalar in array" to iterate element-by-element no matter
how many dimensions array has, while "foreach array in array" does
something different from that.it reduce just one dimension. Now I expect, and I think so it is
correct, so user knows a used dimension. Just doesn't know a data. So
user can to decide and fill correct type. The design strictly remove
any U.I. from design. So using a incorect type is bug.In other words, your proposal is error-prone to use, restricted in what
it can do, and incapable of being extended later without breaking
things. If there is some redeeming social value to set against those
problems, I'm not seeing it.What I think we should have is
FOREACH scalar-variable IN ARRAY array-expression
which iterates element by element regardless of how many dimensions the
array has. Then there should be some other syntax for iterating over
slices, and we should give some thought to being able to specify how
"deep" the slice is. I can definitely think of use cases for pulling
off either 1 dimension at a time (so you get vectors) or N-1 dimensions
at a time, and it's not out of the realm of reason to want intermediate
cases.
I am not against
Maybe
FOR_EACH scalar-variable IN ARRAY array-expression
FOR_SLICE array-variable [DEPTH n] IN ARRAY array-expression
Or I guess you could use the same leading keyword if you make the depth
specification mandatory for the slice case:FOREACH scalar-variable IN ARRAY array-expression
FOREACH array-variable SLICE n IN ARRAY array-expression
I prefer FOREACH keyword. The syntax can be enhanced and I like a talk
about it. I am not sure if SLICE is good keyword for this, but I don't
know better - hope so native speakers can select well. I could to use
maybe "DIMENSIONS" ?
Regards
Pavel
Show quoted text
That might be a better idea since it avoids the inevitable argument over
whether the default slice depth should be 1 dimension or N-1 dimensions.regards, tom lane
2010/12/17 Merlin Moncure <mmoncure@gmail.com>
On Fri, Dec 17, 2010 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/12/17 Tom Lane <tgl@sss.pgh.pa.us>:
Furthermore, it's underspecified: who's to say how many dimensions of
the array are supposed to get sliced off? There's no reasonable place
to extend this syntax to specify that. It will also be inconsistent
for "foreach scalar in array" to iterate element-by-element no matter
how many dimensions array has, while "foreach array in array" does
something different from that.it reduce just one dimension. Now I expect, and I think so it is
correct, so user knows a used dimension. Just doesn't know a data. So
user can to decide and fill correct type. The design strictly remove
any U.I. from design. So using a incorect type is bug.In other words, your proposal is error-prone to use, restricted in what
it can do, and incapable of being extended later without breaking
things. If there is some redeeming social value to set against those
problems, I'm not seeing it.What I think we should have is
FOREACH scalar-variable IN ARRAY array-expression
which iterates element by element regardless of how many dimensions the
array has. Then there should be some other syntax for iterating over
slices, and we should give some thought to being able to specify how
"deep" the slice is. I can definitely think of use cases for pulling
off either 1 dimension at a time (so you get vectors) or N-1 dimensions
at a time, and it's not out of the realm of reason to want intermediate
cases.Maybe
FOR_EACH scalar-variable IN ARRAY array-expression
FOR_SLICE array-variable [DEPTH n] IN ARRAY array-expression
Or I guess you could use the same leading keyword if you make the depth
specification mandatory for the slice case:FOREACH scalar-variable IN ARRAY array-expression
FOREACH array-variable SLICE n IN ARRAY array-expression
That might be a better idea since it avoids the inevitable argument over
whether the default slice depth should be 1 dimension or N-1 dimensions.another way:
FOREACH scalar IN ARRAY arr_exp DIMS in dim_var
dim_var being int[], or possibly text, of length #dimensions, giving
per dimesion index.
If dim_var contains length it is need to be renamed:
FOREACH scalar IN ARRAY arr_exp SIZES IN sizes_var.
I like this because it would fit well with alternate form of unnest,
should it ever be written:create function unnest(anyarray, dims out int[], elem out anyelement)
returns setof...SLICE notation is still good though, and it's probably faster since
you have less work to do in iteration step? It's certainly easier,
but very plpgsql specific.merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
// Dmitriy.
On Sat, Dec 18, 2010 at 02:03, Merlin Moncure <mmoncure@gmail.com> wrote:
FOREACH scalar-variable IN ARRAY array-expression
FOR_EACH scalar-variable IN ARRAY array-expression
FOR_SLICE array-variable [DEPTH n] IN ARRAY array-expression
FOREACH scalar-variable IN ARRAY array-expression
FOREACH array-variable SLICE n IN ARRAY array-expressionFOREACH scalar IN ARRAY arr_exp DIMS in dim_var
It should be not a main subject, but I remember there was a discussion
that "IN ARRAY array-expression" looks redundant for a literal array:
IN ARRAY ARRAY[1, 3, 5]
Are there any improvement for the issue?
--
Itagaki Takahiro
Merlin Moncure <mmoncure@gmail.com> writes:
another way:
FOREACH scalar IN ARRAY arr_exp DIMS in dim_var
dim_var being int[], or possibly text, of length #dimensions, giving
per dimesion index.
[ scratches head... ] I don't follow what you envision this doing,
exactly?
I'm not thrilled with that specific syntax because it'd require making
DIMS a reserved word, but right at the moment I'm more concerned about
what semantics you have in mind.
regards, tom lane
2010/12/17 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Sat, Dec 18, 2010 at 02:03, Merlin Moncure <mmoncure@gmail.com> wrote:
FOREACH scalar-variable IN ARRAY array-expression
FOR_EACH scalar-variable IN ARRAY array-expression
FOR_SLICE array-variable [DEPTH n] IN ARRAY array-expression
FOREACH scalar-variable IN ARRAY array-expression
FOREACH array-variable SLICE n IN ARRAY array-expressionFOREACH scalar IN ARRAY arr_exp DIMS in dim_var
It should be not a main subject, but I remember there was a discussion
that "IN ARRAY array-expression" looks redundant for a literal array:IN ARRAY ARRAY[1, 3, 5]
Are there any improvement for the issue?
yes. It know it. The reason for this is bigger space for possible
future features related to FOREACH loop.
Regards
Pavel
Show quoted text
--
Itagaki Takahiro
On Fri, Dec 17, 2010 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
another way:
FOREACH scalar IN ARRAY arr_exp DIMS in dim_var
dim_var being int[], or possibly text, of length #dimensions, giving
per dimesion index.[ scratches head... ] I don't follow what you envision this doing,
exactly?I'm not thrilled with that specific syntax because it'd require making
DIMS a reserved word, but right at the moment I'm more concerned about
what semantics you have in mind.
It's like _pg_expandarray but alterted support multiple dimensions:
select * from unnest_dims(array[['a','b'],['c','d']]) returns
[1,1], 'a'
[1,2], 'b'
[2,1], 'c'
[2,2], 'd'
this provides alternate way of pulling slices, slower possibly, but
more abstract.
merlin
On 12/17/2010 12:15 PM, Pavel Stehule wrote:
2010/12/17 Itagaki Takahiro<itagaki.takahiro@gmail.com>:
It should be not a main subject, but I remember there was a discussion
that "IN ARRAY array-expression" looks redundant for a literal array:IN ARRAY ARRAY[1, 3, 5]
Are there any improvement for the issue?
yes. It know it. The reason for this is bigger space for possible
future features related to FOREACH loop.
So what you're saying is we need to allow ugliness now so we can have
more ugliness in future? I don't find that a convincing argument. I
share the dislike for this syntax.
cheers
andrew
Merlin Moncure <mmoncure@gmail.com> writes:
On Fri, Dec 17, 2010 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ scratches head... ] �I don't follow what you envision this doing,
exactly?
It's like _pg_expandarray but alterted support multiple dimensions:
select * from unnest_dims(array[['a','b'],['c','d']]) returns
[1,1], 'a'
[1,2], 'b'
[2,1], 'c'
[2,2], 'd'
Oh, so that's an *output* not an input. And IIUC what you are returning
is the subscripts associated with the current element, not the array's
dimensions. Seems like it should go beside the normal target variable
then, not at the end.
FOREACH variable_for_value [, variable_for_subscripts ] IN ARRAY ...
regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes:
On 12/17/2010 12:15 PM, Pavel Stehule wrote:
The reason for this is bigger space for possible
future features related to FOREACH loop.
So what you're saying is we need to allow ugliness now so we can have
more ugliness in future? I don't find that a convincing argument. I
share the dislike for this syntax.
Well, we did beat up Pavel over trying to shoehorn this facility into
the existing FOR syntax, so I can hardly blame him for thinking this
way. The question is whether we're willing to assume that FOREACH will
be limited to iterating over arrays, meaning we'll be stuck with
inventing yet another initial keyword if some other fundamentally
different concept comes along. Right at the moment I can't think of
any plausible candidates, but ...
regards, tom lane
2010/12/17 Andrew Dunstan <andrew@dunslane.net>:
On 12/17/2010 12:15 PM, Pavel Stehule wrote:
2010/12/17 Itagaki Takahiro<itagaki.takahiro@gmail.com>:
It should be not a main subject, but I remember there was a discussion
that "IN ARRAY array-expression" looks redundant for a literal array:IN ARRAY ARRAY[1, 3, 5]
Are there any improvement for the issue?
yes. It know it. The reason for this is bigger space for possible
future features related to FOREACH loop.So what you're saying is we need to allow ugliness now so we can have more
ugliness in future? I don't find that a convincing argument. I share the
dislike for this syntax.
can be strange from me, but it is. If we close a back door now, then
we have not a space after ten years. There can be possible loops over
records, maybe over other iterable data. With this design is important
one think. A keyword after K_IN must not be a reserved keyword.
I am expecting, so typical use case doesn't be a iteration over
constant array, but over variable
so mostly often you have to write
FOREACH var IN ARRAY second_var
LOOP
...
END LOOP
Regards
Pavel
Show quoted text
cheers
andrew