Add on_perl_init and proper destruction to plperl [PATCH]
This is the third of the patches to be split out from the former 'plperl
feature patch 1'.
Changes in this patch:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.
- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.
Tim.
Attachments:
plperl-initend.patchtext/x-patch; charset=us-asciiDownload+213-105
Tim Bunce wrote:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.
OK, we've made good progress with the PL/Perl patches, and this one is
next on the queue.
It should also be noted that as proposed END blocks will not run at all
in the postmaster, even if perl is preloaded in the postmaster and the
preloaded code sets END handlers. That makes setting them rather safer,
ISTM.
So, are there still objections to applying this patch?
(Note, this is different from the proposal to specify on_trusted_init
and on_untrusted_init handlers. The on_perl_init handler would be run on
library load, and is mainly for the purpose of preloading perl modules
and the like).
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tim Bunce wrote:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.
So, are there still objections to applying this patch?
Yes.
regards, tom lane
On Tue, Jan 26, 2010 at 23:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tim Bunce wrote:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.So, are there still objections to applying this patch?
Yes.
FWIW the atexit scares me to. I was thinking a good workaround
perhaps would be to provide a function that destroys the interpreter
(so that the END blocks get called). Tim would that work OK ? If we
are still worried about that hanging we can probably do something
hacky with alarm() and/or signals...
Maybe a good solid use case will help figure this out? Im assuming
the current one is to profile plperl functions and dump a prof file in
/tmp/ or some such (which happens at END time). Or did I miss the use
case in one of the other threads?
On Wed, Jan 27, 2010 at 12:46:42AM -0700, Alex Hunsaker wrote:
On Tue, Jan 26, 2010 at 23:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tim Bunce wrote:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.So, are there still objections to applying this patch?
Yes.
FWIW the atexit scares me to.
In what way, specifically?
I understand concerns about interacting with the database, so the
patch ensures that any use of spi functions throws an exception.
I don't recall any other concrete concerns.
Specifically, how is code that starts executing at the end of a session
different in risk to code that starts executing before the end of a session?
DO $$ while (1) { } $$ language plperl;
Tim.
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tim Bunce wrote:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.So, are there still objections to applying this patch?
Yes.
I see I asked the wrong question. Start again.
What more should be done to make all or some of it acceptable?
cheers
andrew
On Wed, Jan 27, 2010 at 01:14:16AM -0500, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tim Bunce wrote:
- Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
SPI functions are not available when the code is run.- Added normal interpreter destruction behaviour
END blocks, if any, are run then objects are
destroyed, calling their DESTROY methods, if any.
SPI functions will die if called at this time.So, are there still objections to applying this patch?
Yes.
To focus the discussion I've looked back through all the messages from
you that relate to this issue so I can summarize and try to address your
objections.
Some I've split or presented out of order, most relate to earlier (less
restricted) versions of the patch before it was split out, and naturally
they are lacking some context, so I've included archive URLs.
Please forgive and correct me if I misrepresent you or your intent here.
Regarding the utility of plperl.on_perl_init and END:
http://archives.postgresql.org/message-id/18338.1260033447@sss.pgh.pa.us
The question is not about whether we think it's useful; the question
is about whether it's safe.
I agree.
Regarding visibility of changes to plperl.on_perl_init:
http://archives.postgresql.org/message-id/28618.1259952660@sss.pgh.pa.us
What is to happen if the admin changes the value when the system is already up?
If a GUC could be defined as PGC_BACKEND and only settable by superuser,
perhaps that would be a good fit. [GucContext seems to conflate some things.]
Meanwhile the _init name is meant to convey the fact that it's a
before-first-use GUC, like temp_buffers.
I'm happy to accept whatever you'd recommend by way of PGC_* GUC selection.
Documentation can note any caveats associated with combining
plperl.on_perl_init with shared_preload_libraries.
http://archives.postgresql.org/message-id/4516.1263168347@sss.pgh.pa.us
However, I think PGC_SIGHUP would be enough to address my basic
worry, which is that people shouldn't be depending on the ability to set
these things within an individual session.
The patch uses PGC_SIGHUP for plperl.on_perl_init.
http://archives.postgresql.org/message-id/8950.1259994082@sss.pgh.pa.us
Tom, what's your objection to Shlib load time being user-visible?
It's not really designed to be user-visible. Let me give you just
two examples:
* We call a plperl function for the first time in a session, causing
plperl.so to be loaded. Later the transaction fails and is rolled
back. If loading plperl.so caused some user-visible things to happen,
should those be rolled back? If so, how do we get perl to play along?
If not, how do we get postgres to play along?
I believe that's addressed by spi functions being disabled when init code runs.
* We call a plperl function for the first time in a session, causing
plperl.so to be loaded. This happens in the context of a superuser
calling a non-superuser security definer function, or perhaps vice
versa. Whose permissions apply to whatever the on_load code tries
to do? (Hint: every answer is wrong.)
I think that related to on_*trusted_init not plperl.on_perl_init, and
is also addressed by spi functions being disabled when init code runs.
That doesn't even begin to cover the problems with allowing any of
this to happen inside the postmaster. Recall that the postmaster
does not have any database access.
I believe that's addressed by spi functions being disabled when init code runs.
Furthermore, it is a very long
established reliability principle around here that the postmaster
process should do as little as possible, because every thing that it
does creates another opportunity to have a nonrecoverable failure.
The postmaster can recover if a child crashes, but the other way
round, not so much.
I understand that concern. Ultimately, though, that comes down to the
judgement of DBAs and the trust placed in them. They can already
load arbitrary code via shared_preload_libraries.
http://archives.postgresql.org/message-id/18338.1260033447@sss.pgh.pa.us
I think if we do this the on_perl_init setting should probably be
PGC_POSTMASTER, which would remove any issue about it changing
underneath us.
Yes, if the main intended usage is in combination with preloading perl
at postmaster start, it would be pointless to imagine that PGC_SIGHUP
is useful anyway.
http://archives.postgresql.org/message-id/17793.1260031296@sss.pgh.pa.us
Yeah, in the shower this morning I was thinking that not loading
SPI till after the on_init code runs would alleviate the concerns
about transactionality and permissions --- that would ensure that
whatever on_init does affects only the Perl world and not the database
world.
That's included in the current patch (and also applies to END blocks).
However, we're not out of the woods yet. In a trusted interpreter
(plperl not plperlu), is the on_init code executed before we lock down
the interpreter with Safe? I would think it has to be since the main
point AFAICS is to let you preload code via "use". But then what is
left of the security guarantees of plperl? I can hardly imagine DBAs
wanting to vet a few thousand lines of random Perl code to see if it
contains anything that could be subverted.
plperl.on_perl_init code, set by the DBA, runs before the Safe
compartment is created. Without explicitextra steps the Safe
compartment has no access to code loaded by plperl.on_perl_init.
The Safe compartment (plperl) could get access to loaded code in one of
these ways:
1. by using SQL to call a plperlu function that accesses the code.
2. by the DBA 'sharing' a specific subroutine with the compartment.
3. by the DBA loading a module into the compartment.
There's no formal interface for 2. and 3. at the moment, so the only
official option is 1. (The final patch in the series includes some
building blocks towards an interface for 2 & 3, but doesn't add one.)
If you're willing to also confine the feature to plperlu, then maybe
the risk level could be decreased from insane to merely unreasonable.
I think you could reasonably describe plperl.on_perl_init as effectively
confined to plperlu (because plperl has no access to any new code).
http://archives.postgresql.org/message-id/26766.1263149361@sss.pgh.pa.us
For the record, I think it's a bad idea to run arbitrary
user-defined code in the postmaster, and I think it's a worse idea to
run arbitrary user-defined code at backend shutdown (the END-blocks bit).
I do not care in the least what applications you think this might
enable --- the negative consequences for overall system stability seem
to me to outweigh any possible arguments on that side.
- What happens when the supplied code has errors,
For on_perl_init it throws an exception that propagates to the user
statement that triggered the initialization of perl. It also ensures
that perl is left in a non-initialized state, so any further uses
also fail.
For END blocks an error triggers an exception that's caught by perl.
(As noted above, there's no access to postgres from init or END code.)
- takes an unreasonable amount of time to run,
Unreasonable is in the eye of the DBA, of course, and they
have the discretion to set on_perl_init to fit their needs.
For END blocks, I don't see how this issue is any different from
"users might do something dumb", like DO 'while(1){}' language plperl;
(or plpython , pltcl, or plpgsql for that matter).
- does something unsafe,
Such as? The code can't do anything more unsafe than is already possible.
- depends on the backend not being in an error state already,
The code has no access to postgress, whatever the state.
- etc. etc?
I'd welcome more concrete examples of potential issues.
Tim.
Tim Bunce <Tim.Bunce@pobox.com> writes:
On Wed, Jan 27, 2010 at 12:46:42AM -0700, Alex Hunsaker wrote:
FWIW the atexit scares me to.
In what way, specifically?
It runs too late, and too unpredictably, during the shutdown sequence.
(In particular note that shutdown itself might be fired as an atexit
callback, a move forced on us by exactly the sort of random user code
that you want to add more of. It's not clear whether a Perl-added
atexit would fire before or after that.)
I understand concerns about interacting with the database, so the
patch ensures that any use of spi functions throws an exception.
That assuages my fears to only a tiny degree. SPI is not the only
possible connection between perl code and the rest of the backend.
Indeed, AFAICS the major *point* of these additions is to allow people
to insert unknown other functionality that is likely to interact
with the rest of the backend; a prospect that doesn't make me feel
better about it.
Specifically, how is code that starts executing at the end of a session
different in risk to code that starts executing before the end of a session?
If it runs before the shutdown sequence starts, we know we have a
functioning backend. Once shutdown starts, it's unknown and mostly
untested exactly what subsystems will still work and which won't.
Injecting arbitrary user-written code into an unspecified point in
that sequence is not a recipe for good results.
Lastly, an atexit trigger will still fire during FATAL or PANIC aborts,
which scares me even more. When the house is already afire, it's
not prudent to politely let user-written perl code do whatever it wants
before you get the heck out of there.
regards, tom lane
Tom Lane wrote:
Indeed, AFAICS the major *point* of these additions is to allow people
to insert unknown other functionality that is likely to interact
with the rest of the backend; a prospect that doesn't make me feel
better about it.
No. The major use case we've seen for END blocks is to allow a profiler
to write its data out. That should have zero interaction with the rest
of the backend.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
Indeed, AFAICS the major *point* of these additions is to allow people
to insert unknown other functionality that is likely to interact
with the rest of the backend; a prospect that doesn't make me feel
better about it.
No. The major use case we've seen for END blocks is to allow a profiler
to write its data out. That should have zero interaction with the rest
of the backend.
Really? We've found that gprof, for instance, doesn't exactly have
"zero interaction with the rest of the backend" --- there's actually
a couple of different bits in there to help it along, including a
behavioral change during shutdown. I rather doubt that Perl profilers
would turn out much different.
But in any case, I don't believe for a moment that profiling is the only
or even the largest use to which people would try to put this.
regards, tom lane
On Wed, Jan 27, 2010 at 11:13:43AM -0500, Tom Lane wrote:
Tim Bunce <Tim.Bunce@pobox.com> writes:
On Wed, Jan 27, 2010 at 12:46:42AM -0700, Alex Hunsaker wrote:
FWIW the atexit scares me to.
In what way, specifically?
It runs too late, and too unpredictably, during the shutdown sequence.
(In particular note that shutdown itself might be fired as an atexit
callback, a move forced on us by exactly the sort of random user code
that you want to add more of. It's not clear whether a Perl-added
atexit would fire before or after that.)
man atexit says "Functions so registered are called in reverse order".
Since the plperl atexit is called only when a plperl SP or DO is
executed it would fire before any atexit() registered during startup.
The timing and predictability shouldn't be a significant concern if the
plperl subsystem can't interact with the rest of the backend - which is
the intent.
I understand concerns about interacting with the database, so the
patch ensures that any use of spi functions throws an exception.That assuages my fears to only a tiny degree. SPI is not the only
possible connection between perl code and the rest of the backend.
Could you give me some examples of others?
Indeed, AFAICS the major *point* of these additions is to allow people
to insert unknown other functionality that is likely to interact
with the rest of the backend; a prospect that doesn't make me feel
better about it.
The major point is *not at all* to allow people to interact with the
rest of the backend. I'm specifically trying to limit that.
The major point is simply to allow perl code to clean itself up properly.
Specifically, how is code that starts executing at the end of a session
different in risk to code that starts executing before the end of a session?If it runs before the shutdown sequence starts, we know we have a
functioning backend. Once shutdown starts, it's unknown and mostly
untested exactly what subsystems will still work and which won't.
Injecting arbitrary user-written code into an unspecified point in
that sequence is not a recipe for good results.
The plperl subsystem is isolated from, and can't interact with, the rest
of the backend during shutdown.
Can you give me examples where that's not the case?
Lastly, an atexit trigger will still fire during FATAL or PANIC aborts,
which scares me even more. When the house is already afire, it's
not prudent to politely let user-written perl code do whatever it wants
before you get the heck out of there.
Again, that point rests on your underlying concern about interaction
between plperl and the rest of the backend. Examples?
Is there some way for plperl.c to detect a FATAL or PANIC abort?
If so, or if one could be added, then we could skip the END code in
those circumstances.
I don't really want to add more GUCs, but perhaps controlling END
block execution via a plperl.destroy_end=bool (default false) would
help address your concerns.
Tim.
Tom Lane <tgl@sss.pgh.pa.us> writes:
[...]
Lastly, an atexit trigger will still fire during FATAL or PANIC aborts,
which scares me even more. When the house is already afire, it's
not prudent to politely let user-written perl code do whatever it wants
before you get the heck out of there.
Is there a reason that these panics don't use _exit(3) to bypass
atexit hooks?
- FChE
Tom Lane wrote:
But in any case, I don't believe for a moment that profiling is the only
or even the largest use to which people would try to put this.
Well, ISTR there have been requests over the years for event handlers
for (among other things) session shutdown, so if you're speculating that
people would use this as an end run around our lack of such things you
could be right. Maybe providing for such handlers in a more general and
at the same time more safe way would be an alternative.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
I see I asked the wrong question. Start again.
What more should be done to make all or some of it acceptable?
I think a "must" is to get rid of the use of atexit(). Possibly an
on_proc_exit callback could be used instead, although I'm not sure how
you'd handle the case of code loaded in the postmaster that would like
corresponding exit-time code to happen in child processes. (OTOH, it
seems likely that it's impossible to make that work correctly anyway.
It certainly isn't going to work the same on EXEC_BACKEND platforms
as anywhere else, and I don't particularly want to see us documenting
that the feature works differently on Windows than elsewhere.)
Dropping the ability to make the postmaster run any such code would go a
very long way towards fixing the above, as well as assuaging other
fears.
The other thing that I find entirely unconvincing is Tim's idea that
shutting off SPI isolates perl from the rest of the backend. I have
no confidence in that, but no real idea of how to do better either :-(.
If you think that shutting off SPI is sufficient, you can find
counterexamples in the CVS history, for instance where we had to take
special measures to prevent Perl from screwing up the locale settings.
I'm afraid that on_perl_init is going to vastly expand the opportunities
for that kind of unwanted side-effect; and the earlier that it runs, the
more likely it's going to be that we can't recover easily.
regards, tom lane
fche@redhat.com (Frank Ch. Eigler) writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
Lastly, an atexit trigger will still fire during FATAL or PANIC aborts,
which scares me even more. When the house is already afire, it's
not prudent to politely let user-written perl code do whatever it wants
before you get the heck out of there.
Is there a reason that these panics don't use _exit(3) to bypass
atexit hooks?
Well, I don't really want to entirely forbid the use of atexit() ---
I'm just concerned about using it to run arbitrary user-written code.
There might be more limited purposes for which it's a reasonable choice.
regards, tom lane
Tim Bunce <Tim.Bunce@pobox.com> writes:
On Wed, Jan 27, 2010 at 11:13:43AM -0500, Tom Lane wrote:
(In particular note that shutdown itself might be fired as an atexit
callback, a move forced on us by exactly the sort of random user code
that you want to add more of. It's not clear whether a Perl-added
atexit would fire before or after that.)
man atexit says "Functions so registered are called in reverse order".
Since the plperl atexit is called only when a plperl SP or DO is
executed it would fire before any atexit() registered during startup.
Right, which means that it would occur either before or after
on_proc_exit processing, depending on whether we got there through
an exit() call or via the normal proc_exit sequence. That's just
the kind of instability I don't want to have to debug.
The plperl subsystem is isolated from, and can't interact with, the rest
of the backend during shutdown.
This is exactly the claim that I have zero confidence in. Quite
frankly, the problem with Perl as an extension language is that Perl was
never designed to be a subsystem: it feels free to mess around with the
entire state of the process. We've been burnt multiple times by that
even with the limited use we make of Perl now, and these proposed
additions are going to make it a lot worse IMO.
regards, tom lane
On Jan 27, 2010, at 9:08 AM, Tom Lane wrote:
This is exactly the claim that I have zero confidence in. Quite
frankly, the problem with Perl as an extension language is that Perl was
never designed to be a subsystem: it feels free to mess around with the
entire state of the process. We've been burnt multiple times by that
even with the limited use we make of Perl now, and these proposed
additions are going to make it a lot worse IMO.
Can you provide an example? Such concerns are impossible to address without concrete examples.
Best,
David
"David E. Wheeler" <david@kineticode.com> writes:
On Jan 27, 2010, at 9:08 AM, Tom Lane wrote:
This is exactly the claim that I have zero confidence in. Quite
frankly, the problem with Perl as an extension language is that Perl was
never designed to be a subsystem: it feels free to mess around with the
entire state of the process. We've been burnt multiple times by that
even with the limited use we make of Perl now, and these proposed
additions are going to make it a lot worse IMO.
Can you provide an example? Such concerns are impossible to address without concrete examples.
Two examples that I can find in a quick review of our CVS history: perl
stomping on the process's setlocale state, and perl stomping on the
stdio state (Windows only).
regards, tom lane
On Jan 27, 2010, at 10:08 AM, Tom Lane wrote:
Two examples that I can find in a quick review of our CVS history: perl
stomping on the process's setlocale state, and perl stomping on the
stdio state (Windows only).
Are there links to those commits?
Thanks,
David
On Wed, Jan 27, 2010 at 12:08:48PM -0500, Tom Lane wrote:
Tim Bunce <Tim.Bunce@pobox.com> writes:
On Wed, Jan 27, 2010 at 11:13:43AM -0500, Tom Lane wrote:
(In particular note that shutdown itself might be fired as an atexit
callback, a move forced on us by exactly the sort of random user code
that you want to add more of. It's not clear whether a Perl-added
atexit would fire before or after that.)man atexit says "Functions so registered are called in reverse order".
Since the plperl atexit is called only when a plperl SP or DO is
executed it would fire before any atexit() registered during startup.Right, which means that it would occur either before or after
on_proc_exit processing, depending on whether we got there through
an exit() call or via the normal proc_exit sequence. That's just
the kind of instability I don't want to have to debug.
Okay. I could change the callback code to ignore calls if
proc_exit_inprogress is false. So an abnormal shutdown via exit()
wouldn't involve plperl at all. (Alternatively I could use use
on_proc_exit() instead of atexit() to register the callback.)
Would that address this call sequence instability issue?
The plperl subsystem is isolated from, and can't interact with, the
rest of the backend during shutdown.This is exactly the claim that I have zero confidence in. Quite
frankly, the problem with Perl as an extension language is that Perl was
never designed to be a subsystem: it feels free to mess around with the
entire state of the process. We've been burnt multiple times by that
even with the limited use we make of Perl now, and these proposed
additions are going to make it a lot worse IMO.
On Wed, Jan 27, 2010 at 09:53:44AM -0800, David E. Wheeler wrote:
Can you provide an example? Such concerns are impossible to address
without concrete examples.
On Wed, Jan 27, 2010 at 01:08:56PM -0500, Tom Lane wrote:
Two examples that I can find in a quick review of our CVS history: perl
stomping on the process's setlocale state, and perl stomping on the
stdio state (Windows only).
Neither of those relate to the actions of perl source code.
To address that, instead of calling perl_destruct() to perform a
complete destruction I could just execute END blocks and object
destructors. That would avoid executing any system-level actions.
Do you have any examples of how a user could write code in a plperl END
block that would interact with the rest of the backend?
Tim.
Import Notes
Reply to msg id not found: 10270.1264615736@sss.pgh.pa.usCBAA4731-0E52-4651-8366-172C3ADD7751@kineticode.com8665.1264612128@sss.pgh.pa.us | Resolved by subject fallback