python modul pre-import to avoid importing each time

Started by Rémi Curaalmost 12 years ago8 messagesgeneral

remi.cura@gmail.com

almost 12 years ago

Hey List,

I use plpython with postgis and 2 python modules (numpy and shapely).
Sadly importing such module in the plpython function is very slow (several
hundreds of milliseconds).

I also don't know if this overhead is applied each time the function is
called in the same session.

Is there a way to pre-import those modules once and for all,
such that the python function are accelerated?

Thanks,

Cheers,
Rémi-C

Jeff Janes

jeff.janes@gmail.com

almost 12 years ago

In reply to: Rémi Cura (#1)

Re: python modul pre-import to avoid importing each time

On Thu, Jun 19, 2014 at 7:50 AM, Rémi Cura <remi.cura@gmail.com> wrote:

Hey List,

I use plpython with postgis and 2 python modules (numpy and shapely).
Sadly importing such module in the plpython function is very slow (several
hundreds of milliseconds).

Is that mostly shapely (which I don't have)? numpy seems to be pretty
fast, like 16ms. But that is still slow for what you want, perhaps.

I also don't know if this overhead is applied each time the function is
called in the same session.

It is not. The overhead is once per connection, not once per call.
So using a connection pooler could be really be a help here.

Is there a way to pre-import those modules once and for all,
such that the python function are accelerated?

I don't think there is. With plperl you can do this by loading the
module in plperl.on_init and by putting plperl into
shared_preload_libraries so that this happens just at server start up.
But I don't see a way to do something analogous for plpython due to
lack of plpython.on_init. I think that is because the infrastructure
to do that is part of making a "trusted" version of the language,
which python doesn't have. (But it could just be that no one has ever
gotten around to adding it.)

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rémi Cura

remi.cura@gmail.com

almost 12 years ago

In reply to: Jeff Janes (#2)

Re: python modul pre-import to avoid importing each time

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be called
a lot (100k times), so even 15 ms per call matters.

I'm still a bit confused by a topic I found here :
http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled
The answer gives a trick to avoid importing each time, so somehow it must
be usefull.

On another internet page (can't find it anymore) somebody mentioned this
module loading at server startup, one way or another, but gave no
precision. It seems that the "plpy" python module get loaded by default,
would'nt it be possible to hack this module to add other import inside it?

I also use PL/R (untrusted I guess) and you can create a special table to
indicate which module to load at startup.

Cheers,
Rémi-C

2014-06-25 21:46 GMT+02:00 Jeff Janes <jeff.janes@gmail.com>:

Show quoted text

On Thu, Jun 19, 2014 at 7:50 AM, Rémi Cura <remi.cura@gmail.com> wrote:

Hey List,

I use plpython with postgis and 2 python modules (numpy and shapely).
Sadly importing such module in the plpython function is very slow

(several

hundreds of milliseconds).

Is that mostly shapely (which I don't have)? numpy seems to be pretty
fast, like 16ms. But that is still slow for what you want, perhaps.

I also don't know if this overhead is applied each time the function is
called in the same session.

It is not. The overhead is once per connection, not once per call.
So using a connection pooler could be really be a help here.

Is there a way to pre-import those modules once and for all,
such that the python function are accelerated?

I don't think there is. With plperl you can do this by loading the
module in plperl.on_init and by putting plperl into
shared_preload_libraries so that this happens just at server start up.
But I don't see a way to do something analogous for plpython due to
lack of plpython.on_init. I think that is because the infrastructure
to do that is part of making a "trusted" version of the language,
which python doesn't have. (But it could just be that no one has ever
gotten around to adding it.)

Cheers,

Jeff

Adrian Klaver

adrian.klaver@aklaver.com

almost 12 years ago

In reply to: Rémi Cura (#3)

Re: python modul pre-import to avoid importing each time

On 06/26/2014 02:14 AM, Rémi Cura wrote:

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be
called a lot (100k times), so even 15 ms per call matters.

I'm still a bit confused by a topic I found here :
http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled

The answer gives a trick to avoid importing each time, so somehow it
must be usefull.

Peters answer is based on using the global dictionary SD to store an
imported library. For more information see here:

http://www.postgresql.org/docs/9.3/interactive/plpython-sharing.html

On another internet page (can't find it anymore) somebody mentioned this
module loading at server startup, one way or another, but gave no
precision. It seems that the "plpy" python module get loaded by default,
would'nt it be possible to hack this module to add other import inside it?

In a sense that is what is being suggested above.

I also use PL/R (untrusted I guess) and you can create a special table
to indicate which module to load at startup.

Cheers,
Rémi-C

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

jeff.janes@gmail.com

almost 12 years ago

In reply to: Rémi Cura (#3)

Re: python modul pre-import to avoid importing each time

On Thu, Jun 26, 2014 at 2:14 AM, Rémi Cura <remi.cura@gmail.com> wrote:

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be called
a lot (100k times), so even 15 ms per call matters.

I'm still a bit confused by a topic I found here :
http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled
The answer gives a trick to avoid importing each time, so somehow it must be
usefull.

I'd want to see the benchmark before deciding that how useful it actually is....

Anyway, that seems to be about calling import over and over within the
same connection, not between different connections, as is your issue.
Also, I think that that suggestion is targeted at removing what is
already a very minor overhead, which is importing the symbols from the
module into the importer's namespace (or however you translate that
into python speak). The slow part is loading the module in the first
place (finding the shared objects, parsing the module's source code,
gluing them together, etc.), not importing the python symbols.

If you arrange to re-use connections, you will probably find no
further optimization is needed.

On another internet page (can't find it anymore) somebody mentioned this
module loading at server startup, one way or another, but gave no precision.
It seems that the "plpy" python module get loaded by default, would'nt it be
possible to hack this module to add other import inside it?

I just thought your question looked lonely and that I'd tell you what
I learned about plperl in case it helped. There may be a way to do
about the same thing in plpython, but if so it doesn't seem to be
documented, or analogous to the way plperl does it. I'm afraid that
exhausts my knowledge of plpython. I don't see any files that
suggests there is a user-editable plpy.py module. If you are willing
to monkey around with C and recompiling, you could probably make it
happen somehow, though.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Adrian Klaver (#4)

Re: python modul pre-import to avoid importing each time

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 06/26/2014 02:14 AM, Rémi Cura wrote:

On another internet page (can't find it anymore) somebody mentioned this
module loading at server startup, one way or another, but gave no
precision. It seems that the "plpy" python module get loaded by default,
would'nt it be possible to hack this module to add other import inside it?

In a sense that is what is being suggested above.

IIRC, plperl has a GUC you can set to tell it to do things at the time
it's loaded (which of course you use in combination with having listed
plperl in shared_preload_libraries). There's no reason except lack of
round tuits why plpython couldn't have a similar feature.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Adrian Klaver

adrian.klaver@aklaver.com

almost 12 years ago

In reply to: Rémi Cura (#3)

Re: python modul pre-import to avoid importing each time

On 06/26/2014 02:14 AM, Rémi Cura wrote:

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be
called a lot (100k times), so even 15 ms per call matters.

I got to thinking about this.

100K over what time frame?

How is it being called?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rémi Cura

remi.cura@gmail.com

almost 12 years ago

In reply to: Adrian Klaver (#7)

Re: python modul pre-import to avoid importing each time

Hey,
thanks, now we have good information:

the python package are really loaded once per connection, so no
optimization is needed.
Unlike plperl or plR there is no easy way to preload packages.
There may be some solutions to make this import at connection start but it
would involve C modification (found no trace of python file or hackable sql
script in postgres source and install directory)

After that,
further optimization is possible by avoiding the useless 'import' (because
it is already loaded) (see the trick here
<http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled>
)
,however benefits are not proven.

My use case is simple geometry manipulation functions. It is easier to use
plpython rather than plpgsql thanks to numpy for vector manipulation.
Usually the functions are called inside complex query with many CTE, and
execute over 100k of rows. Total execution time is in the order of minutes.
(exemple of querry at the end)

Thanks everybody,
Rémi

Example of querry
CREATE TABLE holding_result AS
WITH the_geom AS (
SELECT gid, geom
FROM my_big_table --200k rows
)
SELECT gid, my_python_function(geom) AS result
FROM the_geom;

2014-06-27 4:27 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:

Show quoted text

On 06/26/2014 02:14 AM, Rémi Cura wrote:

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be
called a lot (100k times), so even 15 ms per call matters.

I got to thinking about this.

100K over what time frame?

How is it being called?

--
Adrian Klaver
adrian.klaver@aklaver.com