final patch - plpgsql: for-in-array

Started by Pavel Stehuleover 15 years ago64 messageshackers
Jump to latest
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

this patch implement a new iteration construct - iteration over an
array. The sense of this new iteration is:
* a simple and cleaner syntax
* a faster execution - this bring down a number of detoast operations

create or replace function subscripts(anyarray, int)
returns int[] as $$
select array(select generate_subscripts($1,$2));
$$ language sql;

create or replace function fora_test()
returns int as $$
declare x int; s int = 0;
a int[] := array[1,2,3,4,5,6,7,8,9,10];
begin
for x in array subscripts(a, 1)
loop
s := s + a[x];
end loop;
return s;
end;
$$ language plpgsql;

create or replace function fora_test()
returns int as $$
declare x int; s int = 0;
begin
for x in array array[1,2,3,4,5,6,7,8,9,10]
loop
s := s + x;
end loop;
return s;
end;
$$ language plpgsql;

create or replace function fora_test()
returns int as $$
declare x int; y int;
a fora_point[] := array[(1,2),(3,4),(5,6)];
begin
for x, y in array a
loop
raise notice 'point=%,%', x, y;
end loop;
return 0;
end;
$$ language plpgsql;

Regards

Pavel Stehule

Attachments:

for-in-arrayapplication/octet-stream; name=for-in-arrayDownload+571-74
#2Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Pavel Stehule (#1)
Re: final patch - plpgsql: for-in-array

On Thu, Sep 30, 2010 at 7:46 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

this patch implement a new iteration construct - iteration over an
array. The sense of this new iteration is:
 * a simple and cleaner syntax

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

FOR var IN UNNEST array_expr LOOP
END LOOP

i like this because:
1) is cleaner when array_expr is ARRAY[1,2,3]
2) is not legal now to use the unnest() function without a SELECT in
the context of a FOR loop (or am i missing something?)
3) the unnest() function does the same so seems intuitive what a
FOR-IN-UNNEST do

what i don't know if is this syntax could co-exist with the unnest() function?

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Jaime Casanova (#2)
Re: final patch - plpgsql: for-in-array

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Thu, Sep 30, 2010 at 7:46 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

this patch implement a new iteration construct - iteration over an
array. The sense of this new iteration is:
 * a simple and cleaner syntax

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does? unnest() flattens the whole
array into scalars irregardless of dimensions:
select unnest(array[array[1,2],array[3,4]]);
unnest
--------
1
2
3
4

If yes, then +1 (unless there is some other problem) otherwise -1.

merlin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#3)
Re: final patch - plpgsql: for-in-array

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

regards, tom lane

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#4)
Re: final patch - plpgsql: for-in-array

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Regards

Pavel Stehule

Show quoted text

                       regards, tom lane

#6Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Pavel Stehule (#5)
Re: final patch - plpgsql: for-in-array

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Cédric Villemain (#6)
Re: final patch - plpgsql: for-in-array

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

not only - iteration over array can help with readability but a
general work with SRF (set returning functions is more harder and
slower) - so special loop statement can to safe a some toast op / when
you use a large array and access via index, or can to safe a some work
with memory, because there isn't necessary convert array to set of
tuples. Please, recheck these tests.

test:

CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
(random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
sql;

create or replace function rndarray(int) returns text[] as $$select
array(select rndstr() from generate_series(1,$1)) $$ language sql;

create table t10(x text[]);
insert into t10 select rndarray(10) from generate_series(1,10000);
create table t100(x text[]);
insert into t100 select rndarray(100) from generate_series(1,10000);
create table t1000(x text[]);
insert into t1000 select rndarray(1000) from generate_series(1,10000);

CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0;
v text;
BEGIN
FOR v IN ARRAY $1
LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
s := s || v;
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 393.649 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 2804.502 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
avg
---------------------
10.0000000000000000
(1 row)

Time: 9729.994 ms

CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0;
v text;
BEGIN
FOR v IN SELECT UNNEST($1)
LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
s := s || v;
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 795.383 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 3848.258 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
avg
---------------------
10.0000000000000000
(1 row)

Time: 12366.093 ms

The iteration via specialized FOR IN ARRAY is about 25-30% faster than
FOR IN SELECT UNNEST

postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0; i int;
v text;
BEGIN
FOR i IN array_lower($1,1)..array_upper($1,1)
LOOP
EXIT WHEN l = $3;
IF $1[i] LIKE $2 THEN
s := s || $1[i];
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$
;

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 414.960 ms

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 3460.970 ms

there FOR IN ARRAY is faster about 30% then access per index

for T1000 I had to cancel over 1 minute!!!!

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

it can help with reading of array. But it doesn't help with array
updating :(. For large arrays it can be slow too.

Regards

Pavel Stehule

Show quoted text

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#4)
Re: final patch - plpgsql: for-in-array

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

merlin

#9Robert Haas
robertmhaas@gmail.com
In reply to: Merlin Moncure (#8)
Re: final patch - plpgsql: for-in-array

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Pavel Stehule (#7)
Re: final patch - plpgsql: for-in-array

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

not only - iteration over array can help with readability but a
general work with SRF (set returning functions is more harder and
slower) - so special loop statement can to safe a some toast op / when
you use a large array and access via index, or can to safe a some work
with memory, because there isn't necessary convert array to set of
tuples. Please, recheck these tests.

test:

CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
(random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
sql;

create or replace function rndarray(int) returns text[] as $$select
array(select rndstr() from generate_series(1,$1)) $$ language sql;

create table t10(x text[]);
insert into t10 select rndarray(10) from generate_series(1,10000);
create table t100(x text[]);
insert into t100 select rndarray(100) from generate_series(1,10000);
create table t1000(x text[]);
insert into t1000 select rndarray(1000) from generate_series(1,10000);

CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0;
 v text;
BEGIN
 FOR v IN ARRAY $1
 LOOP
   EXIT WHEN l = $3;
   IF v LIKE $2 THEN
     s := s || v;
     l := l + 1;
   END IF;
 END LOOP;
 RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
       avg
--------------------
 1.1596079803990200
(1 row)

Time: 393.649 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
       avg
--------------------
 3.4976777789245536
(1 row)

Time: 2804.502 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
        avg
---------------------
 10.0000000000000000
(1 row)

Time: 9729.994 ms

CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0;
 v text;
BEGIN
 FOR v IN SELECT UNNEST($1)
 LOOP
   EXIT WHEN l = $3;
   IF v LIKE $2 THEN
     s := s || v;
     l := l + 1;
   END IF;
 END LOOP;
 RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
       avg
--------------------
 1.1596079803990200
(1 row)

Time: 795.383 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
       avg
--------------------
 3.4976777789245536
(1 row)

Time: 3848.258 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
        avg
---------------------
 10.0000000000000000
(1 row)

Time: 12366.093 ms

The iteration via specialized FOR IN ARRAY is about 25-30% faster than
FOR IN SELECT UNNEST

postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0; i int;
 v text;
BEGIN
 FOR i IN array_lower($1,1)..array_upper($1,1)
 LOOP
   EXIT WHEN l = $3;
   IF $1[i] LIKE $2 THEN
     s := s || $1[i];
     l := l + 1;
   END IF;
 END LOOP;
 RETURN s;
END;$function$
;

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
       avg
--------------------
 1.1596079803990200
(1 row)

Time: 414.960 ms

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
       avg
--------------------
 3.4976777789245536
(1 row)

Time: 3460.970 ms

there FOR IN ARRAY is faster about 30% then access per index

for T1000 I had to cancel over 1 minute!!!!

I can't test until this week-end. But I will.

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

it can help with reading of array. But it doesn't help with array
updating :(. For large arrays it can be slow too.

select fast is already a good job, thank you.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#11Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#9)
Re: final patch - plpgsql: for-in-array

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way?  I hate to think
that it will still be slow for people using the already-supported
syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n) ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#9)
Re: final patch - plpgsql: for-in-array

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way?  I hate to think
that it will still be slow for people using the already-supported
syntax.

I afraid so other ways are more difficult with deeper impacts on
plpgsql executor.

what is a slow:

a) repeated detoasting - access with subscripts - maybe detoasted
values can be cached?
b) evaluation of SRF expression - maybe call of SRF function can be
simple expression,
c) faster evaluation ro query

The most important is @a. Only a few people uses a FOR IN SELECT
UNNEST form now. Probably not optimizable on PLpgSQL level is form FOR
IN SELECT * FROM UNNEST.

FOR IN ARRAY doesn't impacts on expression executing - this patch is
just simple and isolated.

Pavel

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Cédric Villemain (#11)
Re: final patch - plpgsql: for-in-array

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way?  I hate to think
that it will still be slow for people using the already-supported
syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

yes, it can help, but still if you iterate over complete array, you
have to do n - detoast ops.

Pavel

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#9)
Re: final patch - plpgsql: for-in-array

On 11/18/2010 10:33 AM, Robert Haas wrote:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure<mmoncure@gmail.com> wrote:

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

It's not disastrously slower. AFAICT from a very quick glance over the
patch, he's getting the speedup by bypassing the normal mechanism for
evaluating "for x in select ...". So we'd have to special-case that to
trap calls to unnest in the general form. That would be pretty ugly. Or
else make unnest and SPI faster. But that's a much bigger project.

Syntactic sugar is not entirely to be despised, anyway.

cheers

andrew

#15Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Pavel Stehule (#13)
Re: final patch - plpgsql: for-in-array

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all.  AFAICS this patch
simply allows you to replace

       for x in select unnest(array_value) loop

with

       for x in unnest array_value loop

(plus or minus a parenthesis or so).  I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way?  I hate to think
that it will still be slow for people using the already-supported
syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

yes, it can help, but still if you iterate over complete array, you
have to do n - detoast ops.

sure.

Pavel

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#8)
Re: final patch - plpgsql: for-in-array

Merlin Moncure <mmoncure@gmail.com> writes:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes, which begs the question of why bother at all.

Pavel's performance argument is imnsho valid.

Well, that argument is unsupported by any evidence, so far as I've seen.

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax. That would provide the benefit to existing code not just
new code; and it would avoid foreclosing our future options for
extending FOR in not-so-redundant ways.

regards, tom lane

#17Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#16)
Re: final patch - plpgsql: for-in-array

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes, which begs the question of why bother at all.

Pavel's performance argument is imnsho valid.

Well, that argument is unsupported by any evidence, so far as I've seen.

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax.  That would provide the benefit to existing code not just
new code; and it would avoid foreclosing our future options for
extending FOR in not-so-redundant ways.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Regards

Pavel Stehule

Show quoted text

                       regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#17)
Re: final patch - plpgsql: for-in-array

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Just because you haven't thought about it doesn't mean it's impossible
or impractical.

The first implementation I was thinking of would involve looking at the
SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
--- that is, empty jointree and so on, single tlist item that is an
invocation of the unnest() function.  If it is, you pull out the
argument expression of unnest() and go from there, with exactly the same
execution behavior as in the specialized-syntax patch.  This is
perfectly safe if you identify the array_unnest function by OID: since
it's a built-in function you know exactly what it's supposed to do.

But having said that, it's still not apparent to me that array_unnest
itself is markedly slower than what you could hope to do in plpgsql.
I think the real issue here is that plpgsql's simple-expression code
can't be used with set-returning expressions, which means that we have
to go through the vastly more expensive SPI code path. But that
restriction is largely based on fear of re-using expression trees, which
is something we fixed a few weeks ago. I think that it would now be
interesting to look at whether "FOR x IN SELECT simple-expression" could
use the simple-expression code even when the expression returns set.
That approach might bring a useful speedup not just for unnest, but for
many other use-cases as well.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#14)
Re: final patch - plpgsql: for-in-array

Andrew Dunstan <andrew@dunslane.net> writes:

Syntactic sugar is not entirely to be despised, anyway.

If it were harmless syntactic sugar I wouldn't be objecting so loudly.
The problem here is that FOR is a syntactic choke point: it's already
overloaded with several different sub-syntaxes that are quite difficult
to separate. Adding another one makes that worse, with the consequences
that we might misinterpret the user's intent, leading either to
misleading/unhelpful error messages or unexpected runtime behavior.
If you consult the archives you can find numerous past instances of
complaints from people who hit such problems with the existing set of
FOR sub-syntaxes, so this isn't hypothetical.

I'm also quite afraid of having a conflict with other future extensions
of FOR, which we might want to introduce either on our own hook or for
compatibility with something Oracle might add to PL/SQL in future.

This might not be the worst place in the entire system to be introducing
inessential syntactic sugar, but it's certainly one of the top ten.
I would *much* rather we get the performance benefit by internal
optimization, and forego inventing syntax.

regards, tom lane

#20Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#18)
Re: final patch - plpgsql: for-in-array

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Just because you haven't thought about it doesn't mean it's impossible
or impractical.

The first implementation I was thinking of would involve looking at the
SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
--- that is, empty jointree and so on, single tlist item that is an
invocation of the unnest() function.  If it is, you pull out the
argument expression of unnest() and go from there, with exactly the same
execution behavior as in the specialized-syntax patch.  This is
perfectly safe if you identify the array_unnest function by OID: since
it's a built-in function you know exactly what it's supposed to do.

this additional control will do slow down for any expression - more -
somebody can use a form: SELECT FROM unnest(), and this form will be
slower.

But having said that, it's still not apparent to me that array_unnest
itself is markedly slower than what you could hope to do in plpgsql.
I think the real issue here is that plpgsql's simple-expression code
can't be used with set-returning expressions, which means that we have
to go through the vastly more expensive SPI code path.  But that
restriction is largely based on fear of re-using expression trees, which
is something we fixed a few weeks ago.  I think that it would now be
interesting to look at whether "FOR x IN SELECT simple-expression" could
use the simple-expression code even when the expression returns set.
That approach might bring a useful speedup not just for unnest, but for
many other use-cases as well.

any SRF call must not be faster then direct access to array. There is
overhead with tuples.

I don't understand you. Sorry. I don't think, so your objections are objective.

Regards

Pavel

Show quoted text

                       regards, tom lane

#21Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#19)
#23Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#21)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#23)
#25Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#12)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#22)
#28Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#26)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#25)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: Pavel Stehule (#28)
#32Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#27)
#33Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#29)
#34Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andrew Dunstan (#31)
#35Andrew Dunstan
andrew@dunslane.net
In reply to: Pavel Stehule (#34)
#36Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#30)
#37Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andrew Dunstan (#35)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pavel Stehule (#37)
#39Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#39)
#41Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#38)
#42Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#41)
#43Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Tom Lane (#26)
#44Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#40)
#45Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#44)
In reply to: Tom Lane (#27)
#47Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Jaime Casanova (#2)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Valentine Gogichashvili (#46)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Jaime Casanova (#47)
#50Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#49)
#51Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#50)
#52Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#51)
#53David Fetter
david@fetter.org
In reply to: Pavel Stehule (#52)
#54Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#52)
#56Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#55)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cédric Villemain (#56)
#59Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#57)
#60Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#58)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#61)
#63Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#62)
#64Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#62)