plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Started by Hannu Krosingover 14 years ago52 messageshackers
Jump to latest
#1Hannu Krosing
hannu@tm.ee

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID
LANGUAGE plperlu
AS $_X$
$SIG{ALRM} = 'IGNORE';
$_X$;

( unfortunately this hoses deadlock detection and statement_timeout )

Environment where this crash does happen:

Debian GNU/Linux 6.0 - x86-64
openssl 0.9.8o-4squeeze1
postgresql-9.0 9.0.4-1~bpo60+1
postgresql-plperl-9.0 9.0.4-1~bpo60+1
libwww-perl 5.836-1

Postgresql is installed from backports

It does not happen on 32 bit ubuntu

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

Attachments:

plperl_crashtest.sqltext/x-sql; charset=UTF-8; name=plperl_crashtest.sqlDownload
#2Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#1)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

cheers

andrew

#3Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#1)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, 2011-08-04 at 15:07 +0200, Hannu Krosing wrote:

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

Resending - the previous one was in pre-edit stage with
instructions/comments in estonian :(

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID
LANGUAGE plperlu
AS $_X$
$SIG{ALRM} = 'IGNORE';
$_X$;

( unfortunately this hoses deadlock detection and statement_timeout )

Environment where this crash does happen:

Debian GNU/Linux 6.0 - x86-64
openssl 0.9.8o-4squeeze1
postgresql-9.0 9.0.4-1~bpo60+1
postgresql-plperl-9.0 9.0.4-1~bpo60+1
libwww-perl 5.836-1

Postgresql is installed from backports

It does not happen on 32 bit ubuntu

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

Attachments:

plperl_crashtest.sqltext/x-sql; charset=UTF-8; name=plperl_crashtest.sqlDownload
#4Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#2)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Show quoted text

cheers

andrew

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Hannu Krosing (#4)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

I vaguely remember looking in the guts of LWP::UserAgent a few years ago
and being rather annoyed at the way it dealt with sigalrm -- it
interfered with other uses we had for the signal. I think we had to run
a patched version of that module or something, not sure.

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Probably too much, but then since this is an untrusted pl my guess is
that it's OK to request the user to do it only in functions that need
it. I wonder if we could have a check on return from a function that
the sighandler is still what we had before the function was called, to
help discover this problem.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Yeah.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#4)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Hannu Krosing <hannu@krosing.net> writes:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

The crash is related to something leaving begind a bad SIGALARM handler,

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?
I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#4)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 09:53 AM, Hannu Krosing wrote:

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Probably not very. It could possibly be added to
plc_perlboot.pl::mkfuncsrc() after the prolog, or maybe before.

cheers

andrew

#8Alexey Klyukin
alexk@commandprompt.com
In reply to: Alvaro Herrera (#5)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Aug 4, 2011, at 5:25 PM, Alvaro Herrera wrote:

Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

I vaguely remember looking in the guts of LWP::UserAgent a few years ago
and being rather annoyed at the way it dealt with sigalrm -- it
interfered with other uses we had for the signal. I think we had to run
a patched version of that module or something, not sure.

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Probably too much, but then since this is an untrusted pl my guess is
that it's OK to request the user to do it only in functions that need
it. I wonder if we could have a check on return from a function that
the sighandler is still what we had before the function was called, to
help discover this problem.

If we can do that, than why won't we move a step further and restore an old
signal handler on mismatch?

--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 10:28 AM, Tom Lane wrote:

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

True.

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

I'm not sure how perl (or any loadable library) could restrict that in
loaded C code, which many perl modules call directly or indirectly. It's
as open as, say, a loadable C function is in Postgres ;-) You have a
gun. It's loaded. If you point it at your foot and pull the trigger
don't blame us. I think you just need to be very careful about what you
do with plperlu. Don't be surprised if things break.

cheers

andrew

#10Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#6)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, 2011-08-04 at 10:28 -0400, Tom Lane wrote:

Hannu Krosing <hannu@krosing.net> writes:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

The crash is related to something leaving begind a bad SIGALARM handler,

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?
I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

It then seems that it is a goo idea to treat any fiddling with
postgreSQL signal handlers as an error, and rise an ERROR if any signal
handler has changed between calling the function and return, in a way
suggested by Alvaro

This at least forces the developer to pay attention and in case of
pl/perl function use something like the

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

trick suggested by Andrew Dunstan

I know that this is not the real solution, bu at least it is easier to
debug than leaving a round signal handlers pointing to non-existent
code, which will trigger next time the deadlock checker tries to run.

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

#11Alex Hunsaker
badalex@gmail.com
In reply to: Andrew Dunstan (#9)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 09:11, Andrew Dunstan <andrew@dunslane.net> wrote:

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all.  Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

I'm not sure how perl (or any loadable library) could restrict that in
loaded C code, which many perl modules call directly or indirectly. It's as
open as, say, a loadable C function is in Postgres ;-) You have a gun. It's
loaded. If you point it at your foot and pull the trigger don't blame us. I
think you just need to be very careful about what you do with plperlu. Don't
be surprised if things break.

Well we can't prevent perl XS (aka C) from messing with signals (and
other modules like POSIX that expose things like sigprocmask,
siglongjump etc.) , but we could prevent plperl(u) from playing with
signals on the perl level ala %SIG.

[ IIRC I proposed doing something about this when we were talking
about the whole Safe mess, but I think there was too much other
discussion going on at the time :-) ]

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.
3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)
4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

Thoughts?

#12David E. Wheeler
david@kineticode.com
In reply to: Alex Hunsaker (#11)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

+1

Best,

David

#13Alex Hunsaker
badalex@gmail.com
In reply to: David E. Wheeler (#12)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

Yeah, good point.

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alex Hunsaker (#13)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Alex Hunsaker <badalex@gmail.com> writes:

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:

1. perl temporarily takes over SIGALRM.

2. while perl function is running, statement_timeout expires, causing
SIGALRM to be delivered.

3. perl code is probably totally confused, and even if it isn't,
statement_timeout will not be enforced since Postgres won't ever get the
interrupt.

Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

How do you propose to prevent this sort of problem?

regards, tom lane

#15Alex Hunsaker
badalex@gmail.com
In reply to: Tom Lane (#14)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 17:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alex Hunsaker <badalex@gmail.com> writes:

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:
[ $SIG{ALRM} + statement timeout-- what happens?]
....
Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

Yipes, I see your point.

How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Alex Hunsaker (#15)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 08:44 PM, Alex Hunsaker wrote:

On Thu, Aug 4, 2011 at 17:52, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Alex Hunsaker<badalex@gmail.com> writes:

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler<david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:
[ $SIG{ALRM} + statement timeout-- what happens?]
....
Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

Yipes, I see your point.

How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.

Let's slow down a bit. Nobody that we know of has encountered the
problem Tom's referring to, over all the years plperlu has been
available. The changes you're proposing have the potential to downgrade
the usefulness of plperlu considerably without fixing anything that's
known to be an actual problem. Instead of fixing a problem caused by
using LWP you could well make LWP totally unusable from plperlu.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other
PLs, not to mention non-PL loadable modules?

But we *can* fix the original problem reported, namely failure to
restore signal handlers on function exit, with very little downside
(assuming it's shown to be fairly cheap).

cheers

andrew

#17Alex Hunsaker
badalex@gmail.com
In reply to: Andrew Dunstan (#16)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan <andrew@dunslane.net> wrote:

Let's slow down a bit. Nobody that we know of has encountered the problem
Tom's referring to, over all the years plperlu has been available. The
changes you're proposing have the potential to downgrade the usefulness of
plperlu considerably without fixing anything that's known to be an actual
problem. Instead of fixing a problem caused by using LWP you could well make
LWP totally unusable from plperlu.

Well, im not sure about it making LWP totally unusable... You could
always use statement_timeout if you were worried about it blocking
^_^.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other PLs,
not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())

#18Andrew Dunstan
andrew@dunslane.net
In reply to: Alex Hunsaker (#17)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 11:23 PM, Alex Hunsaker wrote:

On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan<andrew@dunslane.net> wrote:

Let's slow down a bit. Nobody that we know of has encountered the problem
Tom's referring to, over all the years plperlu has been available. The
changes you're proposing have the potential to downgrade the usefulness of
plperlu considerably without fixing anything that's known to be an actual
problem. Instead of fixing a problem caused by using LWP you could well make
LWP totally unusable from plperlu.

Well, im not sure about it making LWP totally unusable... You could
always use statement_timeout if you were worried about it blocking
^_^.

Making users set statement_timeout would be a degradation in utility.
For one thing it means you'd never be able to get back and handle an
unresponsiveness reply. And it would be extra work.

(I don't use LWP in any plperlu code AFAIR, but I do use other things
that could well want to set signals. At the very least a change like
this would mandate a LOT of extra testing by my clients.)

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other PLs,
not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())

This whole thing is a massive over-reaction to a problem we almost
certainly know how to fix fairly simply and relatively painlessly, and
attempts (unsuccessfully, at least insofar as comprehensiveness is
concerned) to fix a problem nobody's actually reported having AFAIK.

cheers

andrew

#19Alex Hunsaker
badalex@gmail.com
In reply to: Andrew Dunstan (#18)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Fri, Aug 5, 2011 at 08:53, Andrew Dunstan <andrew@dunslane.net> wrote:

On 08/04/2011 11:23 PM, Alex Hunsaker wrote:

[ ... don't let people set signal handlers postgres sets ]

This whole thing is a massive over-reaction to a problem we almost certainly
know how to fix fairly simply and relatively painlessly, and attempts
(unsuccessfully, at least insofar as comprehensiveness is concerned) to fix
a problem nobody's actually reported having AFAIK.

*shrug* OK.

Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call. I was only able to test on 5.14.1, but I looked
at 5.8.9 to make sure it looks like it works.

Its a tad slower (something like 0.00037ms per call), but uhh thats
quite acceptable IMHO (best of 5 runs):

=> create or replace function simple() returns void as $$ $$ language plperl;
CREATE FUNCTION

-- pre patch
=> select count(simple()) from generate_series(1, 10000000);
Time: 10219.149 ms

-- patched
=> select count(simple()) from generate_series(1, 10000000);
Time: 13924.025 ms

Thoughts?

Attachments:

plperl_local_sig.patchtext/x-patch; charset=US-ASCII; name=plperl_local_sig.patchDownload+17-1
#20Tim Bunce
Tim.Bunce@pobox.com
In reply to: Alex Hunsaker (#19)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

[I've included a summary of the thread and Bcc'd this to perl5-porters
for a sanity check. Please trim heavily when replying.]

On Thu, Aug 04, 2011 at 09:42:31AM -0400, Andrew Dunstan wrote:

So doesn't this look like a bug in the perl module that sets the
signal handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal
handlers for any signal postgres uses all bets are off.

On Thu, Aug 04, 2011 at 10:28:45AM -0400, Tom Lane wrote:

Sure, but how expensive would it be for pl/perl to do this
automatically ?

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

The PERL_IMPLICIT_SYS mechanism addresses this. Unfortunately it only
works with USE_ITHREADS on Windows currently.
http://perldoc.perl.org/perlguts.html#Future-Plans-and-PERL_IMPLICIT_SYS

On Thu, Aug 04, 2011 at 04:09:47PM -0600, Alex Hunsaker wrote:

Well we can't prevent perl XS (aka C) from messing with signals (and
other modules like POSIX that expose things like sigprocmask,
siglongjump etc.) , but we could prevent plperl(u) from playing with
signals on the perl level ala %SIG.

[ IIRC I proposed doing something about this when we were talking
about the whole Safe mess, but I think there was too much other
discussion going on at the time :-) ]

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.
3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)
4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

On Thu, Aug 04, 2011 at 07:52:45PM -0400, Tom Lane wrote:

The scenario I was imagining was:

1. perl temporarily takes over SIGALRM.

2. while perl function is running, statement_timeout expires, causing
SIGALRM to be delivered.

3. perl code is probably totally confused, and even if it isn't,
statement_timeout will not be enforced since Postgres won't ever get the
interrupt.

Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

How do you propose to prevent this sort of problem?

I don't think there's complete solution for that particular scenario.
[Though redirecting the perl alarm() opcode to code that would check the
argument against the remaining seconds before statement_timeout expires,
might get close.]

On Thu, Aug 04, 2011 at 06:44:18PM -0600, Alex Hunsaker wrote:

How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.

On Thu, Aug 04, 2011 at 09:40:57PM -0400, Andrew Dunstan wrote:

Let's slow down a bit. Nobody that we know of has encountered the
problem Tom's referring to, over all the years plperlu has been
available. The changes you're proposing have the potential to
downgrade the usefulness of plperlu considerably without fixing
anything that's known to be an actual problem. Instead of fixing a
problem caused by using LWP you could well make LWP totally unusable
from plperlu.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the
other PLs, not to mention non-PL loadable modules?

But we *can* fix the original problem reported, namely failure to
restore signal handlers on function exit, with very little downside
(assuming it's shown to be fairly cheap).

On Thu, Aug 04, 2011 at 09:23:49PM -0600, Alex Hunsaker wrote:

And plperlu would be the tip of the iceberg. What about all the other PLs,
not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())

On Fri, Aug 05, 2011 at 10:53:21AM -0400, Andrew Dunstan wrote:

This whole thing is a massive over-reaction to a problem we almost
certainly know how to fix fairly simply and relatively painlessly,
and attempts (unsuccessfully, at least insofar as comprehensiveness
is concerned) to fix a problem nobody's actually reported having
AFAIK.

For plperl, as Alex noted above, kill() and alarm() can't be used but
%SIG can be altered. Locally making %SIG readonly for plperl subs
(after __DIE__ and __WARN__ are added) seems cheap and effective.

For plperlu, clearly $SIG{ALRM} is useful. Enforcing localization, thus
fixing the immediate problem, and documenting that it won't work
reliably with statement_timeout, seems like a reasonable approach.

plperlu is already a potential footgun in countless ways. Documenting
that other signal handlers, like USR1, shouldn't be used ought to be enough.

On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote:

*shrug* OK.

Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call.

+ gv = gv_fetchpv("SIG", 0, SVt_PVHV);
+ save_hash(gv); /* local %SIG */

After a little digging and some discussion on the #p5p channel [thanks
to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG)
doesn't do what you might expect. The %SIG does become empty but the OS
level handlers, even those installed by perl, *aren't changed*:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };'
Foo

And, even worse, they're not reset at scope exit:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;'
Bar

That sure seems like a bug (I'll check with the perl5-porters list).

Localizing an individual element of %SIG works fine.
In C that's something like this (untested):

hv = gv_fetchpv("SIG", 0, SVt_PVHV);
keysv = ...SV containing "ALRM"...
he = hv_fetch_ent(hv, keysv, 0, 0);
if (he) { /* arrange to restore existing elem */
save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);
}
else { /* arrange to delete a new elem */
SAVEHDELETE(hv, keysv);
}

Tim.

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Tim Bunce (#20)
#22Alex Hunsaker
badalex@gmail.com
In reply to: Tim Bunce (#20)
#23Tim Bunce
Tim.Bunce@pobox.com
In reply to: Alex Hunsaker (#22)
#24Andrew Dunstan
andrew@dunslane.net
In reply to: Tim Bunce (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#24)
#26Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#27)
#29David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#28)
#31Andres Freund
andres@anarazel.de
In reply to: Hannu Krosing (#1)
#32Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#32)
#34Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#33)
#35Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#33)
#36Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#35)
#38Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#37)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#38)
#40Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#39)
#41Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#40)
#42Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#41)
#43Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#39)
#44Alex Hunsaker
badalex@gmail.com
In reply to: Andres Freund (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Alex Hunsaker (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#43)
#47Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#46)
#49Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#49)
#51Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#51)