final patch - plpgsql: for-in-array

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Merlin Moncure (#3)

Re: final patch - plpgsql: for-in-array

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

regards, tom lane

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#4)

Re: final patch - plpgsql: for-in-array

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Regards

Pavel Stehule

Show quoted text

regards, tom lane

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#5)

Re: final patch - plpgsql: for-in-array

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

pavel.stehule@gmail.com

over 15 years ago

In reply to: Cédric Villemain (#6)

Re: final patch - plpgsql: for-in-array

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

not only - iteration over array can help with readability but a
general work with SRF (set returning functions is more harder and
slower) - so special loop statement can to safe a some toast op / when
you use a large array and access via index, or can to safe a some work
with memory, because there isn't necessary convert array to set of
tuples. Please, recheck these tests.

test:

CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
(random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
sql;

create or replace function rndarray(int) returns text[] as $$select
array(select rndstr() from generate_series(1,$1)) $$ language sql;

create table t10(x text[]);
insert into t10 select rndarray(10) from generate_series(1,10000);
create table t100(x text[]);
insert into t100 select rndarray(100) from generate_series(1,10000);
create table t1000(x text[]);
insert into t1000 select rndarray(1000) from generate_series(1,10000);

CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0;
v text;
BEGIN
FOR v IN ARRAY $1
LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
s := s || v;
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 393.649 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 2804.502 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
avg
---------------------
10.0000000000000000
(1 row)

Time: 9729.994 ms

CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0;
v text;
BEGIN
FOR v IN SELECT UNNEST($1)
LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
s := s || v;
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 795.383 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 3848.258 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
avg
---------------------
10.0000000000000000
(1 row)

Time: 12366.093 ms

The iteration via specialized FOR IN ARRAY is about 25-30% faster than
FOR IN SELECT UNNEST

postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0; i int;
v text;
BEGIN
FOR i IN array_lower($1,1)..array_upper($1,1)
LOOP
EXIT WHEN l = $3;
IF $1[i] LIKE $2 THEN
s := s || $1[i];
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$
;

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 414.960 ms

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 3460.970 ms

there FOR IN ARRAY is faster about 30% then access per index

for T1000 I had to cancel over 1 minute!!!!

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

it can help with reading of array. But it doesn't help with array
updating :(. For large arrays it can be slow too.

Regards

Pavel Stehule

Show quoted text

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Tom Lane (#4)

Re: final patch - plpgsql: for-in-array

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

merlin

robertmhaas@gmail.com

over 15 years ago

In reply to: Merlin Moncure (#8)

Re: final patch - plpgsql: for-in-array

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#7)

Re: final patch - plpgsql: for-in-array

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

this patch is semantically equal to SELECT unnest(..), but it is
evaluated as simple expression and does directly array unpacking and
iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

not only - iteration over array can help with readability but a
general work with SRF (set returning functions is more harder and
slower) - so special loop statement can to safe a some toast op / when
you use a large array and access via index, or can to safe a some work
with memory, because there isn't necessary convert array to set of
tuples. Please, recheck these tests.

test:

CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
(random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
sql;

create or replace function rndarray(int) returns text[] as $$select
array(select rndstr() from generate_series(1,$1)) $$ language sql;

create table t10(x text[]);
insert into t10 select rndarray(10) from generate_series(1,10000);
create table t100(x text[]);
insert into t100 select rndarray(100) from generate_series(1,10000);
create table t1000(x text[]);
insert into t1000 select rndarray(1000) from generate_series(1,10000);

CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0;
v text;
BEGIN
FOR v IN ARRAY $1
LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
s := s || v;
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 393.649 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 2804.502 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
avg
---------------------
10.0000000000000000
(1 row)

Time: 9729.994 ms

CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0;
v text;
BEGIN
FOR v IN SELECT UNNEST($1)
LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
s := s || v;
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 795.383 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 3848.258 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
avg
---------------------
10.0000000000000000
(1 row)

Time: 12366.093 ms

The iteration via specialized FOR IN ARRAY is about 25-30% faster than
FOR IN SELECT UNNEST

postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
RETURNS text[]
LANGUAGE plpgsql
AS $function$
DECLARE
s text[] := '{}';
l int := 0; i int;
v text;
BEGIN
FOR i IN array_lower($1,1)..array_upper($1,1)
LOOP
EXIT WHEN l = $3;
IF $1[i] LIKE $2 THEN
s := s || $1[i];
l := l + 1;
END IF;
END LOOP;
RETURN s;
END;$function$
;

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
avg
--------------------
1.1596079803990200
(1 row)

Time: 414.960 ms

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
avg
--------------------
3.4976777789245536
(1 row)

Time: 3460.970 ms

there FOR IN ARRAY is faster about 30% then access per index

for T1000 I had to cancel over 1 minute!!!!

I can't test until this week-end. But I will.

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

it can help with reading of array. But it doesn't help with array
updating :(. For large arrays it can be slow too.

select fast is already a good job, thank you.

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#11

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Robert Haas (#9)

Re: final patch - plpgsql: for-in-array

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n) ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#12

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#9)

Re: final patch - plpgsql: for-in-array

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

I afraid so other ways are more difficult with deeper impacts on
plpgsql executor.

what is a slow:

a) repeated detoasting - access with subscripts - maybe detoasted
values can be cached?
b) evaluation of SRF expression - maybe call of SRF function can be
simple expression,
c) faster evaluation ro query

The most important is @a. Only a few people uses a FOR IN SELECT
UNNEST form now. Probably not optimizable on PLpgSQL level is form FOR
IN SELECT * FROM UNNEST.

FOR IN ARRAY doesn't impacts on expression executing - this patch is
just simple and isolated.

Pavel

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

pavel.stehule@gmail.com

over 15 years ago

In reply to: Cédric Villemain (#11)

Re: final patch - plpgsql: for-in-array

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n) ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

yes, it can help, but still if you iterate over complete array, you
have to do n - detoast ops.

Pavel

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#14

andrew@dunslane.net

over 15 years ago

In reply to: Robert Haas (#9)

Re: final patch - plpgsql: for-in-array

On 11/18/2010 10:33 AM, Robert Haas wrote:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure<mmoncure@gmail.com> wrote:

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

It's not disastrously slower. AFAICT from a very quick glance over the
patch, he's getting the speedup by bypassing the normal mechanism for
evaluating "for x in select ...". So we'd have to special-case that to
trap calls to unnest in the general form. That would be pretty ugly. Or
else make unnest and SPI faster. But that's a much bigger project.

Syntactic sugar is not entirely to be despised, anyway.

cheers

andrew

#15

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#13)

Re: final patch - plpgsql: for-in-array

2010/11/18 Pavel Stehule <pavel.stehule@gmail.com>:

2010/11/18 Cédric Villemain <cedric.villemain.debian@gmail.com>:

2010/11/18 Robert Haas <robertmhaas@gmail.com>:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

i will start the review of this one... but before that sorry for
suggesting this a bit later but about using UNNEST as part of the
sintax?

Does for-in-array do what unnset does?

Yes, which begs the question of why bother at all. AFAICS this patch
simply allows you to replace

for x in select unnest(array_value) loop

with

for x in unnest array_value loop

(plus or minus a parenthesis or so). I do not think we need to add a
bunch of code and create even more syntactic ambiguity (FOR loops are
already on the hairy edge of unparsability) to save people from writing
"select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome. Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way? I hate to think
that it will still be slow for people using the already-supported
syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n) ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

yes, it can help, but still if you iterate over complete array, you
have to do n - detoast ops.

sure.

Pavel

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#16

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Merlin Moncure (#8)

Re: final patch - plpgsql: for-in-array

Merlin Moncure <mmoncure@gmail.com> writes:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes, which begs the question of why bother at all.

Pavel's performance argument is imnsho valid.

Well, that argument is unsupported by any evidence, so far as I've seen.

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax. That would provide the benefit to existing code not just
new code; and it would avoid foreclosing our future options for
extending FOR in not-so-redundant ways.

regards, tom lane

#17

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#16)

Re: final patch - plpgsql: for-in-array

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes, which begs the question of why bother at all.

Pavel's performance argument is imnsho valid.

Well, that argument is unsupported by any evidence, so far as I've seen.

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax. That would provide the benefit to existing code not just
new code; and it would avoid foreclosing our future options for
extending FOR in not-so-redundant ways.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Regards

Pavel Stehule

Show quoted text

regards, tom lane

#18

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#17)

Re: final patch - plpgsql: for-in-array

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Just because you haven't thought about it doesn't mean it's impossible
or impractical.

The first implementation I was thinking of would involve looking at the
SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
--- that is, empty jointree and so on, single tlist item that is an
invocation of the unnest() function.  If it is, you pull out the
argument expression of unnest() and go from there, with exactly the same
execution behavior as in the specialized-syntax patch.  This is
perfectly safe if you identify the array_unnest function by OID: since
it's a built-in function you know exactly what it's supposed to do.

But having said that, it's still not apparent to me that array_unnest
itself is markedly slower than what you could hope to do in plpgsql.
I think the real issue here is that plpgsql's simple-expression code
can't be used with set-returning expressions, which means that we have
to go through the vastly more expensive SPI code path. But that
restriction is largely based on fear of re-using expression trees, which
is something we fixed a few weeks ago. I think that it would now be
interesting to look at whether "FOR x IN SELECT simple-expression" could
use the simple-expression code even when the expression returns set.
That approach might bring a useful speedup not just for unnest, but for
many other use-cases as well.

regards, tom lane

#19

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Andrew Dunstan (#14)

Re: final patch - plpgsql: for-in-array

Andrew Dunstan <andrew@dunslane.net> writes:

Syntactic sugar is not entirely to be despised, anyway.

If it were harmless syntactic sugar I wouldn't be objecting so loudly.
The problem here is that FOR is a syntactic choke point: it's already
overloaded with several different sub-syntaxes that are quite difficult
to separate. Adding another one makes that worse, with the consequences
that we might misinterpret the user's intent, leading either to
misleading/unhelpful error messages or unexpected runtime behavior.
If you consult the archives you can find numerous past instances of
complaints from people who hit such problems with the existing set of
FOR sub-syntaxes, so this isn't hypothetical.

I'm also quite afraid of having a conflict with other future extensions
of FOR, which we might want to introduce either on our own hook or for
compatibility with something Oracle might add to PL/SQL in future.

This might not be the worst place in the entire system to be introducing
inessential syntactic sugar, but it's certainly one of the top ten.
I would *much* rather we get the performance benefit by internal
optimization, and forego inventing syntax.

regards, tom lane

#20

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#18)

Re: final patch - plpgsql: for-in-array

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/11/18 Tom Lane <tgl@sss.pgh.pa.us>:

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Just because you haven't thought about it doesn't mean it's impossible
or impractical.
The first implementation I was thinking of would involve looking at the
SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
--- that is, empty jointree and so on, single tlist item that is an
invocation of the unnest() function.  If it is, you pull out the
argument expression of unnest() and go from there, with exactly the same
execution behavior as in the specialized-syntax patch.  This is
perfectly safe if you identify the array_unnest function by OID: since
it's a built-in function you know exactly what it's supposed to do.

this additional control will do slow down for any expression - more -
somebody can use a form: SELECT FROM unnest(), and this form will be
slower.

But having said that, it's still not apparent to me that array_unnest
itself is markedly slower than what you could hope to do in plpgsql.
I think the real issue here is that plpgsql's simple-expression code
can't be used with set-returning expressions, which means that we have
to go through the vastly more expensive SPI code path. But that
restriction is largely based on fear of re-using expression trees, which
is something we fixed a few weeks ago. I think that it would now be
interesting to look at whether "FOR x IN SELECT simple-expression" could
use the simple-expression code even when the expression returns set.
That approach might bring a useful speedup not just for unnest, but for
many other use-cases as well.

any SRF call must not be faster then direct access to array. There is
overhead with tuples.

I don't understand you. Sorry. I don't think, so your objections are objective.

Regards

Pavel

Show quoted text

regards, tom lane

#21

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#19)

#22

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#19)

#23

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#21)

#24

robertmhaas@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#23)

#25

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#24)

#26

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#12)

#27

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#22)

#28

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#26)

#29

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#28)

#30

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#25)

#31

andrew@dunslane.net

over 15 years ago

In reply to: Pavel Stehule (#28)

#32

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#27)

#33

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#29)

#34

pavel.stehule@gmail.com

over 15 years ago

In reply to: Andrew Dunstan (#31)

#35

andrew@dunslane.net

over 15 years ago

In reply to: Pavel Stehule (#34)

#36

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#30)

#37

pavel.stehule@gmail.com

over 15 years ago

In reply to: Andrew Dunstan (#35)

#38

Alvaro Herrera

alvherre@2ndquadrant.com

over 15 years ago

In reply to: Pavel Stehule (#37)

#39

pavel.stehule@gmail.com

over 15 years ago

In reply to: Alvaro Herrera (#38)

#40

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#39)

#41

Andres Freund

andres@anarazel.de

over 15 years ago

In reply to: Alvaro Herrera (#38)

#42

andrew@dunslane.net

over 15 years ago

In reply to: Andres Freund (#41)

#43

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Tom Lane (#26)

#44

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#40)

#45

pavel.stehule@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#44)

#46

Valentine Gogichashvili

valgog@gmail.com

over 15 years ago

In reply to: Tom Lane (#27)

#47

Jaime Casanova

jcasanov@systemguards.com.ec

over 15 years ago

In reply to: Jaime Casanova (#2)

#48

robertmhaas@gmail.com

over 15 years ago

In reply to: Valentine Gogichashvili (#46)

#49

robertmhaas@gmail.com

over 15 years ago

In reply to: Jaime Casanova (#47)

#50

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#49)

#51

robertmhaas@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#50)

#52

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#51)

#53

David Fetter

david@fetter.org

over 15 years ago

In reply to: Pavel Stehule (#52)

#54

pavel.stehule@gmail.com

over 15 years ago

In reply to: David Fetter (#53)

#55

robertmhaas@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#52)

#56

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Robert Haas (#55)

#57

robertmhaas@gmail.com

over 15 years ago

In reply to: Cédric Villemain (#56)

#58

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Cédric Villemain (#56)

#59

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#57)

#60

pavel.stehule@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#59)

#61

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#58)

#62

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#61)

#63

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#62)

#64