BUG #13798: Unexpected multiple exection of user defined function with out parameters

Started by Michael Langover 10 years ago7 messagesbugs
Jump to latest
#1Michael Lang
mike.lang1010@gmail.com

The following bug has been logged on the website:

Bug reference: 13798
Logged by: Michael Lang
Email address: mike.lang1010@gmail.com
PostgreSQL version: 9.4.5
Operating system: Ubuntu 12.04
Description:

I've found that when a user defined function has
out parameters, it is invoked once per out parameter if invoked with the
syntax:

`SELECT (udf()).*`

Is this the expected behavior? It seems like it shouldn't.

This syntax is undesireable because it is the only way I've found so far to
get the postgresql backend to return all of the out parameters together
as a row, together with the parameters type information, instead of
returning the out parameters together as the text representation of
the composite type that they form together.

To demonstrate, take the function as follows:
```
CREATE FUNCTION demo(
OUT param1 text,
OUT param2 text,
OUT param3 text
) AS $$
BEGIN
param1 := 'foo';
param2 := 'bar';
param3 := 'baz';
END;
$$ LANGUAGE plpgsql
```

The query `SELECT demo();` produces the result
```
testdb=# SELECT demo();
demo
---------------
(foo,bar,baz)
(1 row)
```
Whereas the query `SELECT (demo()).*` produce the result
```
testdb=# SELECT (demo()).*;
param1 | param2 | param3
--------+--------+--------
foo | bar | baz
(1 row)
```

I've yet to find another means to get postgresql to produce the result
in such a form.

Unfortunately, I've found that the `SELECT (udf()).*` form executes the
udf once per out parameter. This is undesirable for both performance
reasons and unacceptable for functions that cause side effects. To
demonstrate that this is happening I've provided the following example:

```
CREATE TABLE test (
i integer
);

INSERT into test (i) VALUES (0);

CREATE FUNCTION reproduceBehavior(
OUT message1 text,
OUT message2 text,
OUT message3 text,
OUT message4 text
)
AS $$
DECLARE t integer;
BEGIN
SELECT i INTO t FROM test limit 1;
IF t = 0 THEN
update test set i=1;
message1 := 'The value of i is now 1';
END IF;
IF t = 1 THEN
update test set i=2;
message2 := 'The value of i is now 2';
END IF;
IF t = 2 THEN
update test set i=3;
message3 := 'The value of i is now 3';
END IF;
IF t = 3 THEN
update test set i=4;
message4 := 'The value of i is now 4';
END IF;
RETURN;
END;
$$ LANGUAGE plpgsql;

SELECT (reproduceBehavior()).*;
```

Which produces the result:

```
message1 | message2 | message3
| message4
-------------------------+-------------------------+-------------------------+-------------------------
The value of i is now 1 | The value of i is now 2 | The value of i is now 3
| The value of i is now 4
(1 row)
```

I've reproduced this behavior on:
PostgreSQL 9.3.10 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

and

PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Michael Lang (#1)
Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

On Sat, Dec 5, 2015 at 12:30 AM, <mike.lang1010@gmail.com> wrote:

I've found that when a user defined function has
out parameters, it is invoked once per out parameter if invoked with the
syntax:

`SELECT (udf()).*`

Is this the expected behavior? It seems like it shouldn't.

You can see why this happens if you use EXPLAIN this way:

test=# explain (analyze, buffers, verbose) SELECT (reproduceBehavior()).*;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..1.01 rows=1 width=0) (actual time=1.931..1.932
rows=1 loops=1)
Output: (reproducebehavior()).message1,
(reproducebehavior()).message2, (reproducebehavior()).message3,
(reproducebehavior()).message4
Buffers: shared hit=17
Planning time: 0.038 ms
Execution time: 1.968 ms
(5 rows)

That shows that the * causes expansion to the following query:

SELECT
(reproducebehavior()).message1,
(reproducebehavior()).message2,
(reproducebehavior()).message3,
(reproducebehavior()).message4;

From that you can see why it is not surprising that the function is
executed once per OUT parameter, especially if it is VOLATILE.

You seem to be expecting it to behave like this:

SELECT * FROM (SELECT * FROM reproduceBehavior()) x;

Which is interpreted as:

SELECT x.message1, x.message2, x.message3, x.message4
FROM (SELECT * FROM reproduceBehavior()) x;

To avoid surprises, avoid using *.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Michael Lang (#1)
Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

On Fri, Dec 4, 2015 at 11:30 PM, <mike.lang1010@gmail.com> wrote:

The following bug has been logged on the website:

Bug reference: 13798
Logged by: Michael Lang
Email address: mike.lang1010@gmail.com
PostgreSQL version: 9.4.5
Operating system: Ubuntu 12.04
Description:

I've found that when a user defined function has
out parameters, it is invoked once per out parameter if invoked with the
syntax:

`SELECT (udf()).*`

Is this the expected behavior?

​Yes, it is.

Using the recently introduced "LATERAL" feature is the preferred method.

​(i think...)​
​SELECT *​
​FROM src
LATERAL func_call(src.col);​

If that is not possible you can use a CTE.

WITH func_cte AS (
SELECT function_call(...) --do not expand this yet; pass it out as a
composite
)
SELECT (func_cte.function_call).* --now we expand the composite; not the
unusual parens - they are required.
FROM func_cte;

David J.

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Kevin Grittner (#2)
Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

On Tue, Dec 8, 2015 at 9:08 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Sat, Dec 5, 2015 at 12:30 AM, <mike.lang1010@gmail.com> wrote:

I've found that when a user defined function has
out parameters, it is invoked once per out parameter if invoked with the
syntax:

`SELECT (udf()).*`

Is this the expected behavior? It seems like it shouldn't.

You can see why this happens if you use EXPLAIN this way:

​You can see "what" happens. The why is still a mystery...

test=# explain (analyze, buffers, verbose) SELECT (reproduceBehavior()).*;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..1.01 rows=1 width=0) (actual time=1.931..1.932
rows=1 loops=1)
Output: (reproducebehavior()).message1,
(reproducebehavior()).message2, (reproducebehavior()).message3,
(reproducebehavior()).message4
Buffers: shared hit=17
Planning time: 0.038 ms
Execution time: 1.968 ms
(5 rows)

That shows that the * causes expansion to the following query:

SELECT
(reproducebehavior()).message1,
(reproducebehavior()).message2,
(reproducebehavior()).message3,
(reproducebehavior()).message4;

From that you can see why it is not surprising that the function is
executed once per OUT parameter, especially if it is VOLATILE.

​While you've explained how to see what is happening it doesn't remove the
POLA violation that has occurred here.​

You seem to be expecting it to behave like this:

SELECT * FROM (SELECT * FROM reproduceBehavior()) x;

Which is interpreted as:

SELECT x.message1, x.message2, x.message3, x.message4
FROM (SELECT * FROM reproduceBehavior()) x;

To avoid surprises, avoid using *

​OK - but how is one supposed to do that? There is no good way to explode
a composite type, especially one created using a function, without using *.

I've responded to the original thread with two possible alternative query
forms. The CTE one is a hack while the implementation of LATERAL finally
provided a non-hackey means to accomplish the goal. The behavior of
(SELECT (func_call()).*) will likely never change but I'd still argue that
not repeatedly invoking the function would be the better implementation and
the least astonishing one.

David J.

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: David G. Johnston (#4)
Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

On Tue, Dec 8, 2015 at 10:24 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Dec 8, 2015 at 9:08 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

[...] * causes expansion to the following query:

SELECT
(reproducebehavior()).message1,
(reproducebehavior()).message2,
(reproducebehavior()).message3,
(reproducebehavior()).message4;

From that you can see why it is not surprising that the function is
executed once per OUT parameter, especially if it is VOLATILE.

While you've explained how to see what is happening it doesn't remove
the POLA violation that has occurred here.

I tend to think of x.* as being something that is resolved at parse
analysis time, before any consideration is given to how to execute;
thus, when x is a function I didn't find it at all astonishing that
it resolved as above. If you think of * as a macro that expands in
the above manner before planning, the fact that it does not
optimize to a single call is not surprising for a VOLATILE
function; that is required behavior.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#5)
Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

Kevin Grittner <kgrittn@gmail.com> writes:

On Tue, Dec 8, 2015 at 10:24 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

While you've explained how to see what is happening it doesn't remove
the POLA violation that has occurred here.

I tend to think of x.* as being something that is resolved at parse
analysis time, before any consideration is given to how to execute;
thus, when x is a function I didn't find it at all astonishing that
it resolved as above.

As far as table references go, it's effectively required by the SQL spec
that "tab.*" in a SELECT list is expanded to "tab.c1, tab.c2, etc" at
parse time. (I draw this conclusion from the parts of the spec that say
that tab.*'s meaning in a view referencing tab does not change if tab
subsequently gains new columns.)

It's definitely annoying that (foo()).* is expanded similarly; but to fix
that we'd have to introduce an additional layer of evaluation into SELECT
lists, and I think that might create some visible semantic oddities of its
own. Between backwards-compatibility worries and the existence of the
LATERAL workaround, it's unlikely we'll ever change this.

I'm not sure how well this point is documented, though. Might be worth
some effort in that direction.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Michael Lang
mike.lang1010@gmail.com
In reply to: Tom Lane (#6)
Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

On Tue, Dec 8, 2015 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Tue, Dec 8, 2015 at 10:24 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

While you've explained how to see what is happening it doesn't remove
the POLA violation that has occurred here.

I tend to think of x.* as being something that is resolved at parse
analysis time, before any consideration is given to how to execute;
thus, when x is a function I didn't find it at all astonishing that
it resolved as above.

As far as table references go, it's effectively required by the SQL spec
that "tab.*" in a SELECT list is expanded to "tab.c1, tab.c2, etc" at
parse time. (I draw this conclusion from the parts of the spec that say
that tab.*'s meaning in a view referencing tab does not change if tab
subsequently gains new columns.)

It's definitely annoying that (foo()).* is expanded similarly; but to fix
that we'd have to introduce an additional layer of evaluation into SELECT
lists, and I think that might create some visible semantic oddities of its
own. Between backwards-compatibility worries and the existence of the
LATERAL workaround, it's unlikely we'll ever change this.

I'm not sure how well this point is documented, though. Might be worth
some effort in that direction.

regards, tom lane

Thanks to everyone for the attention given, and the illuminating
responses. I'm not sure how I passed over the SELECT * FROM udf(); form
suggested by Oleksandr Shulgin, but it produces the desired result -
executing the function once while returning the out parameters as a row
rather than as the text literal representation of the composite type. It's
an odd quirk that (udf()).* may execute the function multiple times, but
the workarounds make it simple enough to avoid.

Thanks!
Mike Lang