Endless loop calling PL/Python set returning functions

Started by Alexey Grishchenkoabout 10 years ago10 messageshackers
Jump to latest
#1Alexey Grishchenko
agrishchenko@pivotal.io

Hello

There is a bug in implementation of set-returning functions in PL/Python.
When you call the same set-returning function twice in a single query, the
executor falls to infinite loop which causes OOM. Here is a simple
reproduction for this issue:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);

The endless loop is caused by the fact that PL/Python uses PLyProcedure
structure for each of the functions, containing information specific for
the function. This structure is used to store the result set iterator
returned by the Python function call. But in fact, when we call the same
function twice, PL/Python uses the same structure for both calls, and the
same result set iterator (PLyProcedure.setof), which is being constantly
updated by one function after another. When the iterator reaches the end,
the first function sets it to null. Then Postgres calls the second
function, it receives NULL iterator and calls Python function once again,
receiving another iterator. This is an endless loop

In fact, for set-returning functions in Postgres we should use a set
of SRF_* functions, which gives us an access to function call context
(FuncCallContext). In my implementation this context is used to store the
iterator for function result set, so these two calls would have separate
iterators and the query would succeed.

Another issue with calling the same set-returning function twice in the
same query, is that it would delete the input parameter of the function
from the global variables dictionary at the end of execution. With calling
the function twice, this code attempts to delete the same entry from global
variables dict twice, thus causing KeyError. This is why the
function PLy_function_delete_args is modified as well to check whether the
key we intend to delete is in the globals dictionary.

New regression test is included in the patch.

--
Best regards,
Alexey Grishchenko

Attachments:

0001-Fix-endless-loop-in-plpython-set-returning-function.patchapplication/octet-stream; name=0001-Fix-endless-loop-in-plpython-set-returning-function.patchDownload+49-14
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Grishchenko (#1)
Re: Endless loop calling PL/Python set returning functions

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

There is a bug in implementation of set-returning functions in PL/Python.
When you call the same set-returning function twice in a single query, the
executor falls to infinite loop which causes OOM.

Ugh.

Another issue with calling the same set-returning function twice in the
same query, is that it would delete the input parameter of the function
from the global variables dictionary at the end of execution. With calling
the function twice, this code attempts to delete the same entry from global
variables dict twice, thus causing KeyError. This is why the
function PLy_function_delete_args is modified as well to check whether the
key we intend to delete is in the globals dictionary.

That whole business with putting a function's parameters into a global
dictionary makes me itch. Doesn't it mean problems if one plpython
function calls another (presumably via SPI)?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Alexey Grishchenko
agrishchenko@pivotal.io
In reply to: Tom Lane (#2)
Re: Endless loop calling PL/Python set returning functions

I agree that passing function parameters through globals is not the best
solution

It works in a following way - executing custom code (in our case Python
function invocation) in Python is made with PyEval_EvalCode
<https://docs.python.org/2/c-api/veryhigh.html&gt;. As an input to this C
function you specify dictionary of globals that would be available to this
code. The structure PLyProcedure stores "PyObject *globals;", which is the
dictionary of globals for specific function. So SPI works pretty fine, as
each function has a separate dictionary of globals and they don't conflict
with each other

One scenario when the problem occurs, is when you are calling the same
set-returning function in a single query twice. This way they share the
same "globals" which is not a bad thing, but when one function finishes
execution and deallocates input parameter's global, the second will fail
trying to do the same. I included the fix for this problem in my patch

The second scenario when the problem occurs is when you want to call the
same PL/Python function in recursion. For example, this code will not work:

create or replace function test(a int) returns int as $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ language plpythonu;

select test(10);

The function "test" has a single PLyProcedure object allocated to handle
it, thus it has a single "globals" dictionary. When internal function call
finishes, it removes the key "a" from the dictionary, and the outer
function fails with "NameError: global name 'a' is not defined" when it
tries to execute "return a + r"

But the second issue is a separate story and I think it is worth a separate
patch

On Thu, Mar 10, 2016 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

There is a bug in implementation of set-returning functions in PL/Python.
When you call the same set-returning function twice in a single query,

the

executor falls to infinite loop which causes OOM.

Ugh.

Another issue with calling the same set-returning function twice in the
same query, is that it would delete the input parameter of the function
from the global variables dictionary at the end of execution. With

calling

the function twice, this code attempts to delete the same entry from

global

variables dict twice, thus causing KeyError. This is why the
function PLy_function_delete_args is modified as well to check whether

the

key we intend to delete is in the globals dictionary.

That whole business with putting a function's parameters into a global
dictionary makes me itch. Doesn't it mean problems if one plpython
function calls another (presumably via SPI)?

regards, tom lane

--
Best regards,
Alexey Grishchenko

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Grishchenko (#3)
Re: Endless loop calling PL/Python set returning functions

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

One scenario when the problem occurs, is when you are calling the same
set-returning function in a single query twice. This way they share the
same "globals" which is not a bad thing, but when one function finishes
execution and deallocates input parameter's global, the second will fail
trying to do the same. I included the fix for this problem in my patch

The second scenario when the problem occurs is when you want to call the
same PL/Python function in recursion. For example, this code will not work:

Right, the recursion case is what's not being covered by this patch.
I would rather have a single patch that deals with both of those cases,
perhaps by *not* sharing the same dictionary across calls. I think
what you've done here is not so much a fix as a band-aid. In fact,
it doesn't even really fix the problem for the two-calls-per-query
case does it? It'll work if the first execution of the SRF is run to
completion before starting the second one, but not if the two executions
are interleaved. I believe you can test that type of scenario with
something like

select set_returning_function_1(...), set_returning_function_2(...);

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Alexey Grishchenko
agrishchenko@pivotal.io
In reply to: Tom Lane (#4)
Re: Endless loop calling PL/Python set returning functions

No, my fix handles this well.

In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of Postgres
calls to PL/Python handler you just fetch next value from the iterator, you
are not calling the Python function anymore. When the iterator reaches the
end, PL/Python call handler deallocates the global variable representing
function input parameter.

Regardless of the number of parallel invocations of the same function, each
of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the first
function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code. Even if they need - they would
reallocate global variable (it would be set before the Python function
invocation). The issue here was in the fact that they tried to deallocate
the global input variable multiple times independently, which caused error
that I fixed.

Regarding the patch for the second case with recursion - not caching the
"globals" between function calls would have a performance impact, as you
would have to construct "globals" object before each function call. And you
need globals as it contains references to "plpy" module and global
variables and global dictionary ("GD"). I will think on this, maybe there
might be a better design for this scenario. But I still think the second
scenario requires a separate patch

On Thu, Mar 10, 2016 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

One scenario when the problem occurs, is when you are calling the same
set-returning function in a single query twice. This way they share the
same "globals" which is not a bad thing, but when one function finishes
execution and deallocates input parameter's global, the second will fail
trying to do the same. I included the fix for this problem in my patch

The second scenario when the problem occurs is when you want to call the
same PL/Python function in recursion. For example, this code will not

work:

Right, the recursion case is what's not being covered by this patch.
I would rather have a single patch that deals with both of those cases,
perhaps by *not* sharing the same dictionary across calls. I think
what you've done here is not so much a fix as a band-aid. In fact,
it doesn't even really fix the problem for the two-calls-per-query
case does it? It'll work if the first execution of the SRF is run to
completion before starting the second one, but not if the two executions
are interleaved. I believe you can test that type of scenario with
something like

select set_returning_function_1(...), set_returning_function_2(...);

regards, tom lane

--
Best regards,
Alexey Grishchenko

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Grishchenko (#5)
Re: Endless loop calling PL/Python set returning functions

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

No, my fix handles this well.
In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of Postgres
calls to PL/Python handler you just fetch next value from the iterator, you
are not calling the Python function anymore. When the iterator reaches the
end, PL/Python call handler deallocates the global variable representing
function input parameter.

Regardless of the number of parallel invocations of the same function, each
of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the first
function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alexey Grishchenko
agrishchenko@pivotal.io
In reply to: Tom Lane (#6)
Re: Endless loop calling PL/Python set returning functions

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io <javascript:;>> writes:

No, my fix handles this well.
In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of Postgres
calls to PL/Python handler you just fetch next value from the iterator,

you

are not calling the Python function anymore. When the iterator reaches

the

end, PL/Python call handler deallocates the global variable representing
function input parameter.

Regardless of the number of parallel invocations of the same function,

each

of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the

first

function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

Could you elaborate more on this? In general, stack-like solution would
work - if before the function call there is a global variable with the name
matching input variable name, push its value to the stack, and pop it after
the function execution. Would implement it tomorrow and see how it works

--

Sent from handheld device

#8Alexey Grishchenko
agrishchenko@pivotal.io
In reply to: Alexey Grishchenko (#7)
Re: Endless loop calling PL/Python set returning functions

Alexey Grishchenko <agrishchenko@pivotal.io> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

No, my fix handles this well.
In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of Postgres
calls to PL/Python handler you just fetch next value from the iterator,

you

are not calling the Python function anymore. When the iterator reaches

the

end, PL/Python call handler deallocates the global variable representing
function input parameter.

Regardless of the number of parallel invocations of the same function,

each

of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the

first

function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

Could you elaborate more on this? In general, stack-like solution would
work - if before the function call there is a global variable with the name
matching input variable name, push its value to the stack, and pop it after
the function execution. Would implement it tomorrow and see how it works

--

Sent from handheld device

I have improved the code using proposed approach. The second version of
patch is in attachment

It works in a following way - the procedure object PLyProcedure stores
information about the call stack depth (calldepth field) and the stack
itself (argstack field). When the call stack depth is zero we don't make
any additional processing, i.e. there won't be any performance impact for
existing enduser functions. Stack manipulations are put in action only when
the calldepth is greater than zero, which can be achieved either when the
function is called recursively with SPI, or when you are calling the same
set-returning function in a single query twice or more.

Example of multiple calls to SRF within a single function:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);

Before the patch query caused endless loop finishing with OOM. Now it works
as it should

Example of recursion with SPI:

CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ LANGUAGE plpythonu;

select test(10);

Before the patch query failed with "NameError: global name 'a' is not
defined". Now it works correctly and returns 55

--
Best regards,
Alexey Grishchenko

Attachments:

0002-Fix-endless-loop-in-plpython-set-returning-function.patchapplication/octet-stream; name=0002-Fix-endless-loop-in-plpython-set-returning-function.patchDownload+177-14
#9Alexey Grishchenko
agrishchenko@pivotal.io
In reply to: Alexey Grishchenko (#8)
Re: Endless loop calling PL/Python set returning functions

Alexey Grishchenko <agrishchenko@pivotal.io> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

No, my fix handles this well.
In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of

Postgres

calls to PL/Python handler you just fetch next value from the

iterator, you

are not calling the Python function anymore. When the iterator reaches

the

end, PL/Python call handler deallocates the global variable

representing

function input parameter.

Regardless of the number of parallel invocations of the same function,

each

of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the

first

function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

Could you elaborate more on this? In general, stack-like solution would
work - if before the function call there is a global variable with the name
matching input variable name, push its value to the stack, and pop it after
the function execution. Would implement it tomorrow and see how it works

--

Sent from handheld device

I have improved the code using proposed approach. The second version of
patch is in attachment

It works in a following way - the procedure object PLyProcedure stores
information about the call stack depth (calldepth field) and the stack
itself (argstack field). When the call stack depth is zero we don't make
any additional processing, i.e. there won't be any performance impact for
existing enduser functions. Stack manipulations are put in action only when
the calldepth is greater than zero, which can be achieved either when the
function is called recursively with SPI, or when you are calling the same
set-returning function in a single query twice or more.

Example of multiple calls to SRF within a single function:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);

Before the patch query caused endless loop finishing with OOM. Now it
works as it should

Example of recursion with SPI:

CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ LANGUAGE plpythonu;

select test(10);

Before the patch query failed with "NameError: global name 'a' is not
defined". Now it works correctly and returns 55

--
Best regards,
Alexey Grishchenko

Hi

Any comments on this patch?

Regarding passing parameters to the Python function using globals - it was
in initial design of PL/Python (code
<https://github.com/postgres/postgres/blob/0bef7ba549977154572bdbf5682a32a07839fd82/src/pl/plpython/plpython.c#L783&gt;,
documentation
<http://www.postgresql.org/docs/7.2/static/plpython-using.html&gt;).
Originally you had to work with "args" global list of input parameters and
wasn't able to access the named parameters directly. And you can do so even
with the latest release. Going away from global input parameters would
require switching to PyObject_CallFunctionObjArgs
<https://docs.python.org/2/c-api/object.html#c.PyObject_CallFunctionObjArgs&gt;,
which should be possible by changing the function declaration to include
input parameters plus "args" (for backward compatibility). However,
triggers are a bit different - they depend on modifying the global "TD"
dictionary inside the Python function, and they return only the status
string. For them, there is no option of modifying the code to avoid global
input parameters without breaking the backward compatibility with the old
enduser code

--
Best regards,
Alexey Grishchenko

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Grishchenko (#9)
Re: Endless loop calling PL/Python set returning functions

Alexey Grishchenko <agrishchenko@pivotal.io> writes:

Any comments on this patch?

I felt that this was more nearly a bug fix than a new feature, so I picked
it up even though it's nominally in the next commitfest not the current
one. I did not like the code too much as it stood: you were not being
paranoid enough about ensuring that the callstack data structure stayed
in sync with the actual control flow. Also, it didn't work for functions
that modify their argument values (cf the committed regression tests);
you have to explicitly save named arguments not only the "args" version,
and you have to do it for SRF suspend/resume not just recursion cases.
But I cleaned all that up and committed it.

triggers are a bit different - they depend on modifying the global "TD"
dictionary inside the Python function, and they return only the status
string. For them, there is no option of modifying the code to avoid global
input parameters without breaking the backward compatibility with the old
enduser code.

Yeah. It might be worth the trouble to include triggers in the
save/restore logic, since at least in principle they can be invoked
recursively; but there's not that much practical use for such cases.
I didn't bother with that in the patch as-committed, but if you want
to follow up with an adjustment for it, I'd take a look.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers