BUG #5066: plperl issues with perl_destruct() and END blocks
The following bug has been logged online:
Bug reference: 5066
Logged by: Tim Bunce
Email address: Tim.Bunce@pobox.com
PostgreSQL version: 8.4.1
Operating system: darwin
Description: plperl issues with perl_destruct() and END blocks
Details:
The plperl implementation doesn't call perl_destruct() during server
shutdown.
So any resources held by references, in %_SHARED for example, are not
properly freed. The perl interpreter never gets a chance to cleanup, it's
simply discarded.
Related to the above, plperl should also set PL_exit_flags |=
PERL_EXIT_DESTRUCT_END. Currently any END blocks defined during
initialization get executed at initialization (just before perl_run()
returns). Any END blocks defined later never get run.
Setting PL_exit_flags |= PERL_EXIT_DESTRUCT_END in plperl_init_interp() and
calling perl_destruct() will fix the issue.
The timing of the perl_destruct() call (i.e., early or late in the shutdown
sequence) doesn't matter much. You might want to make the spi_* functions
return an error if there's a shutdown in progress.
"Tim Bunce" <Tim.Bunce@pobox.com> writes:
The plperl implementation doesn't call perl_destruct() during server
shutdown.
Who cares? The process is going away anyway.
regards, tom lane
On Sat, Sep 19, 2009 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Tim Bunce" <Tim.Bunce@pobox.com> writes:
The plperl implementation doesn't call perl_destruct() during server
shutdown.Who cares? The process is going away anyway.
END {} blocks can execute arbitrary code. Perl users will expect them
to be executed.
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Sep 19, 2009 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Tim Bunce" <Tim.Bunce@pobox.com> writes:
The plperl implementation doesn't call perl_destruct() during server
shutdown.Who cares? �The process is going away anyway.
END {} blocks can execute arbitrary code. Perl users will expect them
to be executed.
[ shrug... ] As a database geek I find the lack of guarantees about
that to be entirely unsatisfying. What exactly are you going to put
in that block? If it's actually important, are you going to trust
a mechanism that doesn't have any crash safeness?
regards, tom lane
On Sat, Sep 19, 2009 at 11:43:26PM -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Sep 19, 2009 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Tim Bunce" <Tim.Bunce@pobox.com> writes:
The plperl implementation doesn't call perl_destruct() during server
shutdown.Who cares? �The process is going away anyway.
END {} blocks can execute arbitrary code. Perl users will expect them
to be executed.[ shrug... ] As a database geek I find the lack of guarantees about
that to be entirely unsatisfying. What exactly are you going to put
in that block? If it's actually important, are you going to trust
a mechanism that doesn't have any crash safeness?
Can you expand on what you mean by 'guarantees' and 'crash safeness'?
I my particular case I'm trying to enable Devel::NYTProf, the perl
source code performance profiler, to profile PL/Perl code.
See http://www.slideshare.net/Tim.Bunce/develnytprof-200907
(starting
NYTProf uses an END block to finish up the profile and write final
details to the data file. One of the first problems I ran into was that
Postgres was executing the END block before the first plperl sub was
even executed. Very counter intuitive. Since NYTProf is implemented in C
(XS) I've worked around that problem by adding an option to enable
PL_exit_flags |= PERL_EXIT_DESTRUCT_END.
But because postgres doesn't call perl_destruct() the problem has just
moved from END blocks being called too early to END blocks not being
called at all. The effect is that the profile data file is unterminated
and so corrupt and unusable.
To finish the profiling users currently have to execute a SQL statement
to trigger a plperl sub that calls an NYTProf sub that finializes the
profile. Ideally I'd like users to be able to finish the profiling
cleanly with a shutdown (to then restart with profiling disabled).
Calling perl_destruct() during shutdown would fix that.
Tim.
p.s. As a random data point, google code search finds about 7000
perl modules containing an END block:
http://www.google.com/codesearch?as_q=%5EEND%5C+%7B&btnG=Search+Code&hl=en&as_lang=perlas_filename=%5C.pm%24&as_case=y
There's a definitional problem here however. When should we call the
destructor? My impression is that it should happen when the calling
query terminates, not when the backend shuts down. I'm sure this will
cause other issues -- for example %_SHARED will be destroyed way too
early.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Sun, Sep 20, 2009 at 10:00:01PM -0400, Alvaro Herrera wrote:
There's a definitional problem here however. When should we call the
destructor? My impression is that it should happen when the calling
query terminates, not when the backend shuts down. I'm sure this will
cause other issues -- for example %_SHARED will be destroyed way too
early.
The perlmod man page says:
An "END" code block is executed as late as possible, that is, after
perl has finished running the program and just before the interpreter
is being exited, even if it is exiting as a result of a die() function.
[...]
Note that "END" code blocks are not executed at the end of a string
"eval()": if any "END" code blocks are created in a string "eval()",
they will be executed just as any other "END" code block of that pack‐
age in LIFO order just before the interpreter is being exited.
so executing at the end of query, a transaction, or a session would be
wrong. They should execute "as late as possible", "just before the
interpreter being exited".
Tim.
On Mon, Sep 21, 2009 at 11:05:43AM +0100, Tim Bunce wrote:
On Sun, Sep 20, 2009 at 10:00:01PM -0400, Alvaro Herrera wrote:
There's a definitional problem here however. When should we call the
destructor? My impression is that it should happen when the calling
query terminates, not when the backend shuts down. I'm sure this will
cause other issues -- for example %_SHARED will be destroyed way too
early.The perlmod man page says:
An "END" code block is executed as late as possible, that is, after
perl has finished running the program and just before the interpreter
is being exited, even if it is exiting as a result of a die() function.
[...]
Note that "END" code blocks are not executed at the end of a string
"eval()": if any "END" code blocks are created in a string "eval()",
they will be executed just as any other "END" code block of that pack‐
age in LIFO order just before the interpreter is being exited.so executing at the end of query, a transaction, or a session would be
wrong. They should execute "as late as possible", "just before the
interpreter being exited".
Taken literally, that would mean, "the last action before the backend
exits," but at least to me, that sounds troubling for the same reasons
that "end of transaction" triggers do. What happens when there are
two different END blocks in a session? With connection poolers,
backends can last quite awhile. Is it OK for the END block to run
hours after the rest of the code?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter escribi�:
Taken literally, that would mean, "the last action before the backend
exits," but at least to me, that sounds troubling for the same reasons
that "end of transaction" triggers do. What happens when there are
two different END blocks in a session?
The manual is clear that both are executed.
With connection poolers, backends can last quite awhile. Is it OK for
the END block to run hours after the rest of the code?
This is an interesting point -- should END blocks be called on DISCARD ALL?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
David Fetter escribi�:
Taken literally, that would mean, "the last action before the
backend exits," but at least to me, that sounds troubling for the
same reasons that "end of transaction" triggers do. What happens
when there are two different END blocks in a session?The manual is clear that both are executed.
So it is, but does order matter, and if so, how would PostgreSQL know?
With connection poolers, backends can last quite awhile. Is it OK
for the END block to run hours after the rest of the code?This is an interesting point -- should END blocks be called on
DISCARD ALL?
ENOCLUE
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter escribi�:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
David Fetter escribi�:
Taken literally, that would mean, "the last action before the
backend exits," but at least to me, that sounds troubling for the
same reasons that "end of transaction" triggers do. What happens
when there are two different END blocks in a session?The manual is clear that both are executed.
So it is, but does order matter, and if so, how would PostgreSQL know?
The fine manual saith
You may have multiple "END" blocks within a file--they will execute in
reverse order of definition; that is: last in, first out (LIFO).
But then, why would we care? We just call the destructor and Perl
ensures that the blocks are called in the right order.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Mon, Sep 21, 2009 at 12:06 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
David Fetter escribió:
Taken literally, that would mean, "the last action before the backend
exits," but at least to me, that sounds troubling for the same reasons
that "end of transaction" triggers do. What happens when there are
two different END blocks in a session?The manual is clear that both are executed.
With connection poolers, backends can last quite awhile. Is it OK for
the END block to run hours after the rest of the code?This is an interesting point -- should END blocks be called on DISCARD ALL?
It seems pretty reasonable that it would. The intention of DISCARD
ALL is to completely reset the entire session.
...Robert
On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote:
David Fetter escribi�:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
David Fetter escribi�:
Taken literally, that would mean, "the last action before the
backend exits," but at least to me, that sounds troubling for
the same reasons that "end of transaction" triggers do. What
happens when there are two different END blocks in a session?The manual is clear that both are executed.
So it is, but does order matter, and if so, how would PostgreSQL
know?The fine manual saith
You may have multiple "END" blocks within a file--they will
execute in reverse order of definition; that is: last in, first
out (LIFO).But then, why would we care? We just call the destructor and Perl
ensures that the blocks are called in the right order.
This is not quite what I meant. Let's say we have two or more different
PL/Perl functions executed over the course of a backend. Which one's
END block gets executed last? Do we need to warn people about this?
Generate a WARNING, even?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Sep 21, 2009 at 2:17 PM, David Fetter <david@fetter.org> wrote:
On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote:
David Fetter escribió:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
David Fetter escribió:
Taken literally, that would mean, "the last action before the
backend exits," but at least to me, that sounds troubling for
the same reasons that "end of transaction" triggers do. What
happens when there are two different END blocks in a session?The manual is clear that both are executed.
So it is, but does order matter, and if so, how would PostgreSQL
know?The fine manual saith
You may have multiple "END" blocks within a file--they will
execute in reverse order of definition; that is: last in, first
out (LIFO).But then, why would we care? We just call the destructor and Perl
ensures that the blocks are called in the right order.This is not quite what I meant. Let's say we have two or more different
PL/Perl functions executed over the course of a backend. Which one's
END block gets executed last? Do we need to warn people about this?
Generate a WARNING, even?
This is a feature of the Perl language. I don't think it's our job to
second-guess the language design, however good or bad it may be. As a
long-time Perl programmer, I would certainly say that if you are
counting on the execution ordering of your END blocks, you are
probably playing with fire and likely ought to rethink your
application design, because there are all kinds of ways this could
fail spectacularly as a result of apparently innocuous application
changes (like, say, alphabetizing the list of "use" declarations in
some package). But that's true not only with PL/perl but with just
plain old perl, and I don't see that it's substantially more dangerous
here than anywhere else.
...Robert
On Mon, Sep 21, 2009 at 02:28:11PM -0400, Robert Haas wrote:
On Mon, Sep 21, 2009 at 2:17 PM, David Fetter <david@fetter.org> wrote:
On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote:
David Fetter escribi�:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
David Fetter escribi�:
Taken literally, that would mean, "the last action before the
backend exits," but at least to me, that sounds troubling for
the same reasons that "end of transaction" triggers do. �What
happens when there are two different END blocks in a session?The manual is clear that both are executed.
So it is, but does order matter, and if so, how would PostgreSQL
know?The fine manual saith
� � � You may have multiple "END" blocks within a file--they will
� � � execute in reverse order of definition; that is: last in, first
� � � out (LIFO).But then, why would we care? �We just call the destructor and Perl
ensures that the blocks are called in the right order.This is not quite what I meant. �Let's say we have two or more different
PL/Perl functions executed over the course of a backend. �Which one's
END block gets executed last? �Do we need to warn people about this?
Generate a WARNING, even?This is a feature of the Perl language. I don't think it's our job to
second-guess the language design, however good or bad it may be. As a
long-time Perl programmer, I would certainly say that if you are
counting on the execution ordering of your END blocks, you are
probably playing with fire and likely ought to rethink your
application design, because there are all kinds of ways this could
fail spectacularly as a result of apparently innocuous application
changes (like, say, alphabetizing the list of "use" declarations in
some package). But that's true not only with PL/perl but with just
plain old perl, and I don't see that it's substantially more dangerous
here than anywhere else.
OK, we've considered it and decided it's people's own foot-gun :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter escribi�:
On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote:
The fine manual saith
You may have multiple "END" blocks within a file--they will
execute in reverse order of definition; that is: last in, first
out (LIFO).But then, why would we care? We just call the destructor and Perl
ensures that the blocks are called in the right order.This is not quite what I meant. Let's say we have two or more different
PL/Perl functions executed over the course of a backend. Which one's
END block gets executed last?
I think the manual is quite clear on this point. It talks about "files"
which we don't have, but other than that it doesn't seem like there
shouldn't be any problem.
Now that I think about it, this only affects loaded modules, not the
plperl functions themselves, right? I mean, you can't define an END
block inside a function.
Do we need to warn people about this?
I don't see why not -- in the docs, of course.
Generate a WARNING, even?
"Log spam" anyone?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Sep 21, 2009 at 3:08 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
David Fetter escribió:
On Mon, Sep 21, 2009 at 01:06:17PM -0400, Alvaro Herrera wrote:
The fine manual saith
You may have multiple "END" blocks within a file--they will
execute in reverse order of definition; that is: last in, first
out (LIFO).But then, why would we care? We just call the destructor and Perl
ensures that the blocks are called in the right order.This is not quite what I meant. Let's say we have two or more different
PL/Perl functions executed over the course of a backend. Which one's
END block gets executed last?I think the manual is quite clear on this point. It talks about "files"
which we don't have, but other than that it doesn't seem like there
shouldn't be any problem.Now that I think about it, this only affects loaded modules, not the
plperl functions themselves, right? I mean, you can't define an END
block inside a function.
You might think that, but it turns out the world of Perl is crazier
than the ordinary mind can fathom.
$ perl -e 'sub foo { END { print "hi\n" } }'
hi
...Robert
David Fetter <david@fetter.org> writes:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
With connection poolers, backends can last quite awhile. Is it OK
for the END block to run hours after the rest of the code?This is an interesting point -- should END blocks be called on
DISCARD ALL?
ENOCLUE
And in the same vein, should they be called inside a transaction,
or not? What if they fail?
I don't see any reason whatsoever that we couldn't just document this
as a Perl feature not supported in plperl. If you do something like
creating threads inside plperl, we're going to give you the raspberry
when you complain about it breaking. END blocks can perfectly well
go into the same category.
regards, tom lane
On Mon, Sep 21, 2009 at 07:30:51PM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
With connection poolers, backends can last quite awhile. Is it OK
for the END block to run hours after the rest of the code?
Yes.
This is an interesting point -- should END blocks be called on
DISCARD ALL?ENOCLUE
And in the same vein, should they be called inside a transaction,
or not? What if they fail?
As I said in the original ticket, I'd be quite happy for plperl END
blocks to have no access to postgres at all, other than warnings going
to the log. The spi_* functions could return an error if postgres is
being shutdown (perhaps they already would if perl_destruct is called
late in the shutdown sequence). So transactions are mute.
Also, perl_destruct() will catch any exceptions from END blocks.
I don't see any reason whatsoever that we couldn't just document this
as a Perl feature not supported in plperl. If you do something like
creating threads inside plperl, we're going to give you the raspberry
when you complain about it breaking. END blocks can perfectly well
go into the same category.
Returning to my original use case, the NYTProf profiler needs END blocks
to work otherwise the generated profile data will be corrupt.
I don't see any reason not to add PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
to plperl_init_interp(), and for perl_destruct() to be called late in
the shutdown sequence.
Tim.
On Mon, Sep 21, 2009 at 7:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Fetter <david@fetter.org> writes:
On Mon, Sep 21, 2009 at 12:06:30PM -0400, Alvaro Herrera wrote:
With connection poolers, backends can last quite awhile. Is it OK
for the END block to run hours after the rest of the code?This is an interesting point -- should END blocks be called on
DISCARD ALL?ENOCLUE
And in the same vein, should they be called inside a transaction,
or not? What if they fail?I don't see any reason whatsoever that we couldn't just document this
as a Perl feature not supported in plperl. If you do something like
creating threads inside plperl, we're going to give you the raspberry
when you complain about it breaking. END blocks can perfectly well
go into the same category.
If the changes are simple, as Tim seems to believe, exactly what do we
lose by doing this?
...Robert