Making plpython 2 and 3 coexist a bit better

Started by Tom Laneabout 10 years ago10 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Commit 803716013dc1350f installed a safeguard against loading plpython2
and plpython3 at the same time, stating

+   It is not allowed to use PL/Python based on Python 2 and PL/Python
+   based on Python 3 in the same session, because the symbols in the
+   dynamic modules would clash, which could result in crashes of the
+   PostgreSQL server process.  There is a check that prevents mixing
+   Python major versions in a session, which will abort the session if
+   a mismatch is detected.  It is possible, however, to use both
+   PL/Python variants in the same database, from separate sessions.

It turns out though that the freedom promised in the last sentence
is fairly illusory, because if you have functions in both languages
in one DB and you try a dump-and-restore, it will fail.

But it gets worse: a report today in pgsql-general points out that
even if you have the two languages in use *in different databases*,
pg_upgrade will fail, because pg_upgrade rather overenthusiastically
loads every .so mentioned anywhere in the source installation into
one session.

I think we might be able to improve this if we don't do the check in
plpython's _PG_init(), but delay it until we have a need to call into
libpython; which we could skip doing at _PG_init() time, at the cost
of perhaps one more flag check per plpython function call to see if
we've initialized libpython yet.

The behavior would then be that if you load both language .so's
into one session, neither one would be willing to do anything
anymore --- not run a function, and not syntax-check one either,
because any attempted call into either libpython might crash.
But restoring a dump has no need to do either of those things;
it just wants to be able to LOAD the .so. Also, the error for
not being able to do something wouldn't have to be FATAL.

Comments? Am I being too optimistic about whether it's safe to
have both libpythons loaded if we're not calling either of them?

(If I am, we should at least try to improve pg_upgrade so that
it's not imposing a greater restriction than anything else in
the system.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: Making plpython 2 and 3 coexist a bit better

I wrote:

I think we might be able to improve this if we don't do the check in
plpython's _PG_init(), but delay it until we have a need to call into
libpython; which we could skip doing at _PG_init() time, at the cost
of perhaps one more flag check per plpython function call to see if
we've initialized libpython yet.

The behavior would then be that if you load both language .so's
into one session, neither one would be willing to do anything
anymore --- not run a function, and not syntax-check one either,
because any attempted call into either libpython might crash.
But restoring a dump has no need to do either of those things;
it just wants to be able to LOAD the .so. Also, the error for
not being able to do something wouldn't have to be FATAL.

Attached is a draft patch along this line. It seems to work as intended.

I can think of at least a couple of ways in which this test might be
inadequate. One is that a plpython2 function might call a plpython3
function or vice versa. The inner plpython_call_handler would correctly
throw an error, but if the outer function trapped the error and continued
execution, it would be at risk. Another, more hypothetical risk is that
once we've executed anything out of one libpython, it might have changed
process state (eg, installed hooks) in such a way that it can get control
back even without an explicit call to plpython. In that case, a
subsequent load of another Python version would put things at risk for
such execution in the first libpython.

We could ameliorate the first of these cases by putting the can't-run-
with-two-pythons error back up to FATAL rather than ERROR, but I'm not
sure that that would be a net improvement --- FATAL errors aren't very
friendly. In any case errors of the second type seem unpreventable
unless we stick with the immediate-FATAL-error approach.

Since plpython only comes in an untrusted (hence superuser-only) flavor,
holes like this wouldn't be security issues but only "well, you shouldn't
do that" problems. I'm inclined to think it's a good tradeoff for being
able to dump/restore/upgrade mixed installations, which are surely
likely to get more popular.

Thoughts?

regards, tom lane

Attachments:

relax-python-version-conflict-test.patchtext/x-diff; charset=us-ascii; name=relax-python-version-conflict-test.patchDownload+88-85
#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#2)
Re: Making plpython 2 and 3 coexist a bit better

On 1/11/16 11:51 AM, Tom Lane wrote:

We could ameliorate the first of these cases by putting the can't-run-
with-two-pythons error back up to FATAL rather than ERROR, but I'm not
sure that that would be a net improvement --- FATAL errors aren't very
friendly. In any case errors of the second type seem unpreventable
unless we stick with the immediate-FATAL-error approach.

Something that's always concerned me about functions in other languages
is that any kind of snafu in the function/language can hose the backend,
which you may or may not detect. I've used other databases that (by
default) spin up a separate process for executing functions, maybe we
could do something like that? If we treated 2 and 3 as different
languages you could actually use both at the same time in a single
backend. The only thing that's not clear to me is how you'd be able to
re-enter the process during recursive/nested calls.

Obviously this is a lot more work than what you're proposing though. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#3)
Re: Making plpython 2 and 3 coexist a bit better

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

Something that's always concerned me about functions in other languages
is that any kind of snafu in the function/language can hose the backend,
which you may or may not detect. I've used other databases that (by
default) spin up a separate process for executing functions, maybe we
could do something like that? If we treated 2 and 3 as different
languages you could actually use both at the same time in a single
backend. The only thing that's not clear to me is how you'd be able to
re-enter the process during recursive/nested calls.

There's at least one PL/Java implementation that does that. The
interprocess communication overhead is pretty awful, IIRC. Don't know
what they do about nested calls.

Obviously this is a lot more work than what you're proposing though. :(

Yeah. I think what I'm suggesting is a back-patchable fix, which that
certainly wouldn't be.

I realized that the patch I put up before would do the Wrong Thing if
an updated library were loaded followed by a not-updated library for
the other python version. Ideally that couldn't happen, but considering
that the two plpythons are likely to get packaged in separate subpackages
(certainly Red Hat does things that way), it's not too hard to envision
cases where it would. So the attached update corrects that and fixes the
docs too. We can lose the whole test in HEAD, but I think it's necessary
in the back branches.

The question of whether to do ERROR or FATAL remains open. I'm not sure
I have a strong preference either way.

regards, tom lane

Attachments:

relax-python-version-conflict-test-2.patchtext/x-diff; charset=us-ascii; name=relax-python-version-conflict-test-2.patchDownload+81-74
#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#4)
Re: Making plpython 2 and 3 coexist a bit better

On 1/11/16 1:00 PM, Tom Lane wrote:

There's at least one PL/Java implementation that does that. The
interprocess communication overhead is pretty awful, IIRC. Don't know
what they do about nested calls.

You'd think that pipes wouldn't be that much overhead...

Obviously this is a lot more work than what you're proposing though.:(

Yeah. I think what I'm suggesting is a back-patchable fix, which that
certainly wouldn't be.

Yeah, and it sounds like we need one.

The question of whether to do ERROR or FATAL remains open. I'm not sure
I have a strong preference either way.

If they both get loaded is there risk of bad data happening? Personally,
I'll take a traceable FATAL (or even PANIC) over data corruption every
time. But I'm guessing that if you tried to use both you'd pretty
immediately end up crashing the backend.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#5)
Re: Making plpython 2 and 3 coexist a bit better

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

On 1/11/16 1:00 PM, Tom Lane wrote:

The question of whether to do ERROR or FATAL remains open. I'm not sure
I have a strong preference either way.

If they both get loaded is there risk of bad data happening? Personally,
I'll take a traceable FATAL (or even PANIC) over data corruption every
time. But I'm guessing that if you tried to use both you'd pretty
immediately end up crashing the backend.

Yeah, the conservative solution would definitely be to use FATAL.
It's still a usability improvement over what we have now.

In further experimentation, I found out that even with this patch in
place, the plpython transform extensions still present a dump/reload
hazard:

test=# create extension plpython2u;
CREATE EXTENSION
test=# create extension ltree_plpython3u cascade;
NOTICE: installing required extension "ltree"
NOTICE: installing required extension "plpython3u"
ERROR: multiple Python libraries are present in session
DETAIL: Only one Python major version can be used in one session.

AFAICS, the reason for that is this code in the transform extension
scripts:

-- make sure the prerequisite libraries are loaded
DO '1' LANGUAGE plpython3u;

It does appear that we need something for this, because if you just
remove it you get failures like

Symbol not found: _PLyUnicode_FromStringAndSize

But I wonder why we couldn't make it do

LOAD 'plpython3';

instead. If I change the script like that, it seems to go through fine
even with both Python libraries loaded, because CREATE TRANSFORM does not
actually call into the language implementation.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: Making plpython 2 and 3 coexist a bit better

I wrote:

In further experimentation, I found out that even with this patch in
place, the plpython transform extensions still present a dump/reload
hazard:
...
But I wonder why we couldn't make it do
LOAD 'plpython3';

I've verified that the attached additional patch makes everything behave
nicely in this corner of the world. AFAICS this adds no assumptions
that aren't already built into pg_pltemplate.

Last call for objections ...

regards, tom lane

Attachments:

transform-load-fix.patchtext/x-diff; charset=us-ascii; name=transform-load-fix.patchDownload+16-16
#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Making plpython 2 and 3 coexist a bit better

On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:

Commit 803716013dc1350f installed a safeguard against loading plpython2
and plpython3 at the same time, stating

+   It is not allowed to use PL/Python based on Python 2 and PL/Python
+   based on Python 3 in the same session, because the symbols in the
+   dynamic modules would clash, which could result in crashes of the
+   PostgreSQL server process.  There is a check that prevents mixing
+   Python major versions in a session, which will abort the session if
+   a mismatch is detected.  It is possible, however, to use both
+   PL/Python variants in the same database, from separate sessions.

It turns out though that the freedom promised in the last sentence
is fairly illusory, because if you have functions in both languages
in one DB and you try a dump-and-restore, it will fail.

But it gets worse: a report today in pgsql-general points out that
even if you have the two languages in use *in different databases*,
pg_upgrade will fail, because pg_upgrade rather overenthusiastically
loads every .so mentioned anywhere in the source installation into
one session.

The fix for that would be for pg_upgrade to change
check_loadable_libraries() to start a new session for each LOAD command.
Would you like that done?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: Making plpython 2 and 3 coexist a bit better

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:

But it gets worse: a report today in pgsql-general points out that
even if you have the two languages in use *in different databases*,
pg_upgrade will fail, because pg_upgrade rather overenthusiastically
loads every .so mentioned anywhere in the source installation into
one session.

The fix for that would be for pg_upgrade to change
check_loadable_libraries() to start a new session for each LOAD command.
Would you like that done?

Not necessary anymore, at least not for PL/Python; and that solution
sounds a tad expensive.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: Making plpython 2 and 3 coexist a bit better

On Mon, Jan 18, 2016 at 01:10:05PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:

But it gets worse: a report today in pgsql-general points out that
even if you have the two languages in use *in different databases*,
pg_upgrade will fail, because pg_upgrade rather overenthusiastically
loads every .so mentioned anywhere in the source installation into
one session.

The fix for that would be for pg_upgrade to change
check_loadable_libraries() to start a new session for each LOAD command.
Would you like that done?

Not necessary anymore, at least not for PL/Python; and that solution
sounds a tad expensive.

Yes, it would certainly be inefficient.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers