plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Started by Hannu Krosingover 14 years ago52 messages
#1Hannu Krosing
hannu@2ndQuadrant.com
1 attachment(s)

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID
LANGUAGE plperlu
AS $_X$
$SIG{ALRM} = 'IGNORE';
$_X$;

( unfortunately this hoses deadlock detection and statement_timeout )

Environment where this crash does happen:

Debian GNU/Linux 6.0 - x86-64
openssl 0.9.8o-4squeeze1
postgresql-9.0 9.0.4-1~bpo60+1
postgresql-plperl-9.0 9.0.4-1~bpo60+1
libwww-perl 5.836-1

Postgresql is installed from backports

It does not happen on 32 bit ubuntu

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

Attachments:

plperl_crashtest.sqltext/x-sql; charset=UTF-8; name=plperl_crashtest.sqlDownload
#2Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#1)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

cheers

andrew

#3Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Hannu Krosing (#1)
1 attachment(s)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, 2011-08-04 at 15:07 +0200, Hannu Krosing wrote:

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

Resending - the previous one was in pre-edit stage with
instructions/comments in estonian :(

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID
LANGUAGE plperlu
AS $_X$
$SIG{ALRM} = 'IGNORE';
$_X$;

( unfortunately this hoses deadlock detection and statement_timeout )

Environment where this crash does happen:

Debian GNU/Linux 6.0 - x86-64
openssl 0.9.8o-4squeeze1
postgresql-9.0 9.0.4-1~bpo60+1
postgresql-plperl-9.0 9.0.4-1~bpo60+1
libwww-perl 5.836-1

Postgresql is installed from backports

It does not happen on 32 bit ubuntu

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

Attachments:

plperl_crashtest.sqltext/x-sql; charset=UTF-8; name=plperl_crashtest.sqlDownload
#4Hannu Krosing
hannu@krosing.net
In reply to: Andrew Dunstan (#2)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Show quoted text

cheers

andrew

#5Alvaro Herrera
alvherre@commandprompt.com
In reply to: Hannu Krosing (#4)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

I vaguely remember looking in the guts of LWP::UserAgent a few years ago
and being rather annoyed at the way it dealt with sigalrm -- it
interfered with other uses we had for the signal. I think we had to run
a patched version of that module or something, not sure.

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Probably too much, but then since this is an untrusted pl my guess is
that it's OK to request the user to do it only in functions that need
it. I wonder if we could have a check on return from a function that
the sighandler is still what we had before the function was called, to
help discover this problem.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Yeah.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#4)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Hannu Krosing <hannu@krosing.net> writes:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

The crash is related to something leaving begind a bad SIGALARM handler,

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?
I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#4)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 09:53 AM, Hannu Krosing wrote:

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Probably not very. It could possibly be added to
plc_perlboot.pl::mkfuncsrc() after the prolog, or maybe before.

cheers

andrew

#8Alexey Klyukin
alexk@commandprompt.com
In reply to: Alvaro Herrera (#5)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Aug 4, 2011, at 5:25 PM, Alvaro Herrera wrote:

Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?

I vaguely remember looking in the guts of LWP::UserAgent a few years ago
and being rather annoyed at the way it dealt with sigalrm -- it
interfered with other uses we had for the signal. I think we had to run
a patched version of that module or something, not sure.

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

Probably too much, but then since this is an untrusted pl my guess is
that it's OK to request the user to do it only in functions that need
it. I wonder if we could have a check on return from a function that
the sighandler is still what we had before the function was called, to
help discover this problem.

If we can do that, than why won't we move a step further and restore an old
signal handler on mismatch?

--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 10:28 AM, Tom Lane wrote:

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

True.

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

I'm not sure how perl (or any loadable library) could restrict that in
loaded C code, which many perl modules call directly or indirectly. It's
as open as, say, a loadable C function is in Postgres ;-) You have a
gun. It's loaded. If you point it at your foot and pull the trigger
don't blame us. I think you just need to be very careful about what you
do with plperlu. Don't be surprised if things break.

cheers

andrew

#10Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Tom Lane (#6)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, 2011-08-04 at 10:28 -0400, Tom Lane wrote:

Hannu Krosing <hannu@krosing.net> writes:

On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:

The crash is related to something leaving begind a bad SIGALARM handler,

So doesn't this look like a bug in the perl module that sets the signal
handler and doesn't restore it?
I think if you use a perl module that monkeys with the signal handlers
for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

It then seems that it is a goo idea to treat any fiddling with
postgreSQL signal handlers as an error, and rise an ERROR if any signal
handler has changed between calling the function and return, in a way
suggested by Alvaro

This at least forces the developer to pay attention and in case of
pl/perl function use something like the

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

trick suggested by Andrew Dunstan

I know that this is not the real solution, bu at least it is easier to
debug than leaving a round signal handlers pointing to non-existent
code, which will trigger next time the deadlock checker tries to run.

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

#11Alex Hunsaker
badalex@gmail.com
In reply to: Andrew Dunstan (#9)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 09:11, Andrew Dunstan <andrew@dunslane.net> wrote:

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all.  Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

I'm not sure how perl (or any loadable library) could restrict that in
loaded C code, which many perl modules call directly or indirectly. It's as
open as, say, a loadable C function is in Postgres ;-) You have a gun. It's
loaded. If you point it at your foot and pull the trigger don't blame us. I
think you just need to be very careful about what you do with plperlu. Don't
be surprised if things break.

Well we can't prevent perl XS (aka C) from messing with signals (and
other modules like POSIX that expose things like sigprocmask,
siglongjump etc.) , but we could prevent plperl(u) from playing with
signals on the perl level ala %SIG.

[ IIRC I proposed doing something about this when we were talking
about the whole Safe mess, but I think there was too much other
discussion going on at the time :-) ]

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.
3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)
4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

Thoughts?

#12David E. Wheeler
david@kineticode.com
In reply to: Alex Hunsaker (#11)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

+1

Best,

David

#13Alex Hunsaker
badalex@gmail.com
In reply to: David E. Wheeler (#12)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

Yeah, good point.

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alex Hunsaker (#13)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Alex Hunsaker <badalex@gmail.com> writes:

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:

1. perl temporarily takes over SIGALRM.

2. while perl function is running, statement_timeout expires, causing
SIGALRM to be delivered.

3. perl code is probably totally confused, and even if it isn't,
statement_timeout will not be enforced since Postgres won't ever get the
interrupt.

Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

How do you propose to prevent this sort of problem?

regards, tom lane

#15Alex Hunsaker
badalex@gmail.com
In reply to: Tom Lane (#14)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 17:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alex Hunsaker <badalex@gmail.com> writes:

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:
[ $SIG{ALRM} + statement timeout-- what happens?]
....
Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

Yipes, I see your point.

How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Alex Hunsaker (#15)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 08:44 PM, Alex Hunsaker wrote:

On Thu, Aug 4, 2011 at 17:52, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Alex Hunsaker<badalex@gmail.com> writes:

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler<david@kineticode.com> wrote:

On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)

+1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:
[ $SIG{ALRM} + statement timeout-- what happens?]
....
Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

Yipes, I see your point.

How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.

Let's slow down a bit. Nobody that we know of has encountered the
problem Tom's referring to, over all the years plperlu has been
available. The changes you're proposing have the potential to downgrade
the usefulness of plperlu considerably without fixing anything that's
known to be an actual problem. Instead of fixing a problem caused by
using LWP you could well make LWP totally unusable from plperlu.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other
PLs, not to mention non-PL loadable modules?

But we *can* fix the original problem reported, namely failure to
restore signal handlers on function exit, with very little downside
(assuming it's shown to be fairly cheap).

cheers

andrew

#17Alex Hunsaker
badalex@gmail.com
In reply to: Andrew Dunstan (#16)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan <andrew@dunslane.net> wrote:

Let's slow down a bit. Nobody that we know of has encountered the problem
Tom's referring to, over all the years plperlu has been available. The
changes you're proposing have the potential to downgrade the usefulness of
plperlu considerably without fixing anything that's known to be an actual
problem. Instead of fixing a problem caused by using LWP you could well make
LWP totally unusable from plperlu.

Well, im not sure about it making LWP totally unusable... You could
always use statement_timeout if you were worried about it blocking
^_^.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other PLs,
not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())

#18Andrew Dunstan
andrew@dunslane.net
In reply to: Alex Hunsaker (#17)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/04/2011 11:23 PM, Alex Hunsaker wrote:

On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan<andrew@dunslane.net> wrote:

Let's slow down a bit. Nobody that we know of has encountered the problem
Tom's referring to, over all the years plperlu has been available. The
changes you're proposing have the potential to downgrade the usefulness of
plperlu considerably without fixing anything that's known to be an actual
problem. Instead of fixing a problem caused by using LWP you could well make
LWP totally unusable from plperlu.

Well, im not sure about it making LWP totally unusable... You could
always use statement_timeout if you were worried about it blocking
^_^.

Making users set statement_timeout would be a degradation in utility.
For one thing it means you'd never be able to get back and handle an
unresponsiveness reply. And it would be extra work.

(I don't use LWP in any plperlu code AFAIR, but I do use other things
that could well want to set signals. At the very least a change like
this would mandate a LOT of extra testing by my clients.)

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other PLs,
not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())

This whole thing is a massive over-reaction to a problem we almost
certainly know how to fix fairly simply and relatively painlessly, and
attempts (unsuccessfully, at least insofar as comprehensiveness is
concerned) to fix a problem nobody's actually reported having AFAIK.

cheers

andrew

#19Alex Hunsaker
badalex@gmail.com
In reply to: Andrew Dunstan (#18)
1 attachment(s)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Fri, Aug 5, 2011 at 08:53, Andrew Dunstan <andrew@dunslane.net> wrote:

On 08/04/2011 11:23 PM, Alex Hunsaker wrote:

[ ... don't let people set signal handlers postgres sets ]

This whole thing is a massive over-reaction to a problem we almost certainly
know how to fix fairly simply and relatively painlessly, and attempts
(unsuccessfully, at least insofar as comprehensiveness is concerned) to fix
a problem nobody's actually reported having AFAIK.

*shrug* OK.

Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call. I was only able to test on 5.14.1, but I looked
at 5.8.9 to make sure it looks like it works.

Its a tad slower (something like 0.00037ms per call), but uhh thats
quite acceptable IMHO (best of 5 runs):

=> create or replace function simple() returns void as $$ $$ language plperl;
CREATE FUNCTION

-- pre patch
=> select count(simple()) from generate_series(1, 10000000);
Time: 10219.149 ms

-- patched
=> select count(simple()) from generate_series(1, 10000000);
Time: 13924.025 ms

Thoughts?

Attachments:

plperl_local_sig.patchtext/x-patch; charset=US-ASCII; name=plperl_local_sig.patchDownload
*** a/src/pl/plperl/expected/plperl.out
--- b/src/pl/plperl/expected/plperl.out
***************
*** 639,641 **** CONTEXT:  PL/Perl anonymous code block
--- 639,643 ----
  DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl;
  ERROR:  Useless use of sort in scalar context at line 1.
  CONTEXT:  PL/Perl anonymous code block
+ DO $do$ die "$SIG{'ALARM'} set!" if($SIG{'ALARM'}); $SIG{'ALARM'} = sub { print "alarm!\n"}; $do$ LANGUAGE plperl;
+ DO $do$ die "$SIG{'ALARM'} set!" if($SIG{'ALARM'}); $do$ LANGUAGE plperl;
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
***************
*** 1895,1906 **** plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo)
--- 1895,1912 ----
  {
  	dSP;
  	SV		   *retval;
+ 	GV		   *gv;
  	int			i;
  	int			count;
  
  	ENTER;
  	SAVETMPS;
  
+ 	gv = gv_fetchpv("SIG", 0, SVt_PVHV);
+ 	if (!gv)
+ 		elog(ERROR, "couldn't fetch %%SIG");
+ 	save_hash(gv);			/* local %SIG */
+ 
  	PUSHMARK(SP);
  	EXTEND(sp, desc->nargs);
  
***************
*** 1976,1981 **** plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo,
--- 1982,1988 ----
  	dSP;
  	SV		   *retval,
  			   *TDsv;
+ 	GV		   *gv;
  	int			i,
  				count;
  	Trigger    *tg_trigger = ((TriggerData *) fcinfo->context)->tg_trigger;
***************
*** 1986,1995 **** plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo,
  	TDsv = get_sv("_TD", 0);
  	if (!TDsv)
  		elog(ERROR, "couldn't fetch $_TD");
- 
  	save_item(TDsv);			/* local $_TD */
  	sv_setsv(TDsv, td);
  
  	PUSHMARK(sp);
  	EXTEND(sp, tg_trigger->tgnargs);
  
--- 1993,2006 ----
  	TDsv = get_sv("_TD", 0);
  	if (!TDsv)
  		elog(ERROR, "couldn't fetch $_TD");
  	save_item(TDsv);			/* local $_TD */
  	sv_setsv(TDsv, td);
  
+ 	gv = gv_fetchpv("SIG", 0, SVt_PVHV);
+ 	if (!gv)
+ 		elog(ERROR, "couldn't fetch %%SIG");
+ 	save_hash(gv);			/* local %SIG */
+ 
  	PUSHMARK(sp);
  	EXTEND(sp, tg_trigger->tgnargs);
  
*** a/src/pl/plperl/sql/plperl.sql
--- b/src/pl/plperl/sql/plperl.sql
***************
*** 415,417 **** DO $do$ use strict; my $name = "foo"; my $ref = $$name; $do$ LANGUAGE plperl;
--- 415,420 ----
  -- check that we can "use warnings" (in this case to turn a warn into an error)
  -- yields "ERROR:  Useless use of sort in scalar context."
  DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl;
+ 
+ DO $do$ die "$SIG{'ALARM'} set!" if($SIG{'ALARM'}); $SIG{'ALARM'} = sub { print "alarm!\n"}; $do$ LANGUAGE plperl;
+ DO $do$ die "$SIG{'ALARM'} set!" if($SIG{'ALARM'}); $do$ LANGUAGE plperl;
#20Tim Bunce
Tim.Bunce@pobox.com
In reply to: Alex Hunsaker (#19)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

[I've included a summary of the thread and Bcc'd this to perl5-porters
for a sanity check. Please trim heavily when replying.]

On Thu, Aug 04, 2011 at 09:42:31AM -0400, Andrew Dunstan wrote:

So doesn't this look like a bug in the perl module that sets the
signal handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:

{
local $SIG{ALRM};
# do LWP stuff here
}
return 'OK';

That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal
handlers for any signal postgres uses all bets are off.

On Thu, Aug 04, 2011 at 10:28:45AM -0400, Tom Lane wrote:

Sure, but how expensive would it be for pl/perl to do this
automatically ?

How can anything like that possibly work with any reliability
whatsoever? If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all. Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.

The PERL_IMPLICIT_SYS mechanism addresses this. Unfortunately it only
works with USE_ITHREADS on Windows currently.
http://perldoc.perl.org/perlguts.html#Future-Plans-and-PERL_IMPLICIT_SYS

On Thu, Aug 04, 2011 at 04:09:47PM -0600, Alex Hunsaker wrote:

Well we can't prevent perl XS (aka C) from messing with signals (and
other modules like POSIX that expose things like sigprocmask,
siglongjump etc.) , but we could prevent plperl(u) from playing with
signals on the perl level ala %SIG.

[ IIRC I proposed doing something about this when we were talking
about the whole Safe mess, but I think there was too much other
discussion going on at the time :-) ]

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.
3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)
4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:

1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

On Thu, Aug 04, 2011 at 07:52:45PM -0400, Tom Lane wrote:

The scenario I was imagining was:

1. perl temporarily takes over SIGALRM.

2. while perl function is running, statement_timeout expires, causing
SIGALRM to be delivered.

3. perl code is probably totally confused, and even if it isn't,
statement_timeout will not be enforced since Postgres won't ever get the
interrupt.

Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

How do you propose to prevent this sort of problem?

I don't think there's complete solution for that particular scenario.
[Though redirecting the perl alarm() opcode to code that would check the
argument against the remaining seconds before statement_timeout expires,
might get close.]

On Thu, Aug 04, 2011 at 06:44:18PM -0600, Alex Hunsaker wrote:

How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.

On Thu, Aug 04, 2011 at 09:40:57PM -0400, Andrew Dunstan wrote:

Let's slow down a bit. Nobody that we know of has encountered the
problem Tom's referring to, over all the years plperlu has been
available. The changes you're proposing have the potential to
downgrade the usefulness of plperlu considerably without fixing
anything that's known to be an actual problem. Instead of fixing a
problem caused by using LWP you could well make LWP totally unusable
from plperlu.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the
other PLs, not to mention non-PL loadable modules?

But we *can* fix the original problem reported, namely failure to
restore signal handlers on function exit, with very little downside
(assuming it's shown to be fairly cheap).

On Thu, Aug 04, 2011 at 09:23:49PM -0600, Alex Hunsaker wrote:

And plperlu would be the tip of the iceberg. What about all the other PLs,
not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())

On Fri, Aug 05, 2011 at 10:53:21AM -0400, Andrew Dunstan wrote:

This whole thing is a massive over-reaction to a problem we almost
certainly know how to fix fairly simply and relatively painlessly,
and attempts (unsuccessfully, at least insofar as comprehensiveness
is concerned) to fix a problem nobody's actually reported having
AFAIK.

For plperl, as Alex noted above, kill() and alarm() can't be used but
%SIG can be altered. Locally making %SIG readonly for plperl subs
(after __DIE__ and __WARN__ are added) seems cheap and effective.

For plperlu, clearly $SIG{ALRM} is useful. Enforcing localization, thus
fixing the immediate problem, and documenting that it won't work
reliably with statement_timeout, seems like a reasonable approach.

plperlu is already a potential footgun in countless ways. Documenting
that other signal handlers, like USR1, shouldn't be used ought to be enough.

On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote:

*shrug* OK.

Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call.

+ gv = gv_fetchpv("SIG", 0, SVt_PVHV);
+ save_hash(gv); /* local %SIG */

After a little digging and some discussion on the #p5p channel [thanks
to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG)
doesn't do what you might expect. The %SIG does become empty but the OS
level handlers, even those installed by perl, *aren't changed*:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };'
Foo

And, even worse, they're not reset at scope exit:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;'
Bar

That sure seems like a bug (I'll check with the perl5-porters list).

Localizing an individual element of %SIG works fine.
In C that's something like this (untested):

hv = gv_fetchpv("SIG", 0, SVt_PVHV);
keysv = ...SV containing "ALRM"...
he = hv_fetch_ent(hv, keysv, 0, 0);
if (he) { /* arrange to restore existing elem */
save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);
}
else { /* arrange to delete a new elem */
SAVEHDELETE(hv, keysv);
}

Tim.

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Tim Bunce (#20)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/07/2011 07:06 PM, Tim Bunce wrote:

After a little digging and some discussion on the #p5p channel [thanks
to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG)
doesn't do what you might expect. The %SIG does become empty but the OS
level handlers, even those installed by perl, *aren't changed*:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };'
Foo
And, even worse, they're not reset at scope exit:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;'
Bar

That sure seems like a bug (I'll check with the perl5-porters list).

Yeah, that seems very bad. :-(

Localizing an individual element of %SIG works fine.
In C that's something like this (untested):

hv = gv_fetchpv("SIG", 0, SVt_PVHV);
keysv = ...SV containing "ALRM"...
he = hv_fetch_ent(hv, keysv, 0, 0);
if (he) { /* arrange to restore existing elem */
save_helem_flags(hv, keysv,&HeVAL(he), SAVEf_SETMAGIC);
}
else { /* arrange to delete a new elem */
SAVEHDELETE(hv, keysv);
}

Hmm. I think we'll need to test how much it's going to cost to add that
to every plperl (or maybe just every plperlu) function call for the six
or so signals we use.

cheers

andrew

#22Alex Hunsaker
badalex@gmail.com
In reply to: Tim Bunce (#20)
1 attachment(s)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Sun, Aug 7, 2011 at 17:06, Tim Bunce <Tim.Bunce@pobox.com> wrote:

On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote:

...
Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call.

+     gv = gv_fetchpv("SIG", 0, SVt_PVHV);
+     save_hash(gv);                  /* local %SIG */

... [ local %SIG dosn't work ] The %SIG does become empty but the OS
level handlers, even those installed by perl, *aren't changed*:

Looks like I trusted in $SIG{'ALRM'} being undef after it had been set
in a different scope too much :-( Thanks for pointing this out.

That sure seems like a bug (I'll check with the perl5-porters list).

Well even if it was deemed a bug, it dont do us any good.

Localizing an individual element of %SIG works fine.
In C that's something like this (untested):

   hv = gv_fetchpv("SIG", 0, SVt_PVHV);
   keysv = ...SV containing "ALRM"...
   he = hv_fetch_ent(hv, keysv, 0, 0);
   if (he) {  /* arrange to restore existing elem */
       save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);
   }
   else {     /* arrange to delete a new elem */
       SAVEHDELETE(hv, keysv);
   }

I played with this a bit... and found yes, it locals them but no it
does not fix the reported problem. After playing with things a bit
more I found even "local $SIG{'ALRM'} = .,..; alarm(1);" still results
in postgres crashing. To wit, local does squat. AFAICT it just resets
the signal handler back to the default with SIG_DFL. (Which in
hindsight I don't know what else I expected it to-do...)

So I think for this to be robust we would have to detect what signals
they set and then reset those back to what postgres wants. Doable, but
is it worth it? Anyone else have any bright ideas?

Find below my test case and attached a patch that locals individual
%SIG elements the way mentioned above.

=> set statement_timeout to '5s';
SET

=> create or replace function test_alarm() returns void as $$ local
$SIG{'ALRM'} = sub { warn "alarm"; }; alarm(1); sleep 2; $$ language
plperlu;
CREATE FUNCTION

=> select test_alarm();
WARNING: alarm at line 1.
CONTEXT: PL/Perl function "test_alarm"
test_alarm
------------

(1 row)

=> select pg_sleep(6);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Server Log:
WARNING: alarm at line 1.
CONTEXT: PL/Perl function "test_alarm"
LOG: server process (PID 32659) was terminated by signal 14: Alarm clock
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
FATAL: the database system is in recovery mode

Attachments:

plperl_local_sig_v2.patchtext/x-patch; charset=US-ASCII; name=plperl_local_sig_v2.patchDownload
*** a/src/pl/plperl/expected/plperl.out
--- b/src/pl/plperl/expected/plperl.out
***************
*** 639,641 **** CONTEXT:  PL/Perl anonymous code block
--- 639,643 ----
  DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl;
  ERROR:  Useless use of sort in scalar context at line 1.
  CONTEXT:  PL/Perl anonymous code block
+ DO $do$ die "SIG ALRM is set: $SIG{'ALRM'}" if($SIG{'ALRM'}); $SIG{'ALRM'} = sub { print "alarm!\n"}; $do$ LANGUAGE plperl;
+ DO $do$ die "SIG ALRM is set: $SIG{'ALRM'}" if($SIG{'ALRM'}); $do$ LANGUAGE plperl;
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
***************
*** 268,273 **** static void plperl_inline_callback(void *arg);
--- 268,275 ----
  static char *strip_trailing_ws(const char *msg);
  static OP  *pp_require_safe(pTHX);
  static void activate_interpreter(plperl_interp_desc *interp_desc);
+ static void local_sigs(void);
+ static void local_sig(HV *hv, SV *tmpsv, const char *signame);
  
  #ifdef WIN32
  static char *setlocale_perl(int category, char *locale);
***************
*** 1901,1906 **** plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo)
--- 1903,1910 ----
  	ENTER;
  	SAVETMPS;
  
+ 	local_sigs();
+ 
  	PUSHMARK(SP);
  	EXTEND(sp, desc->nargs);
  
***************
*** 1968,1973 **** plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo)
--- 1972,2028 ----
  	return retval;
  }
  
+ /*
+  * local all of our sig handlers some modules like LWP like to set an alarm sig
+  * handler for things like network timeouts, this can cause bad stuff to happen
+  * (not to mention what happens if someone sets USR1)
+  *
+  * for now we just local() them all so they should get reset back to what
+  * postgres expects when their pl function is done
+  */
+ static void
+ local_sigs(void)
+ {
+ 	HV	*hv;
+ 	SV	*sv = newSV(9);
+ 	int i;
+ 
+ 	hv = get_hv("SIG", 0);
+ 	if (!hv)
+ 		elog(ERROR, "couldn't fetch %%SIG");
+ 
+ 	/*
+ 	 * char *PL_sig_name[] has the signal name in %SIG indexed by the signal
+ 	 * number
+ 	 */
+ 	for ( i= 1; i < SIG_SIZE; i++)
+ 		local_sig(hv, sv, PL_sig_name[i]);
+ 
+ 	/*
+ 	 * Note, __DIE__ and __WARN__ are not handled by the above and you can't
+ 	 * really do the same thing with them you would need to save PL_diehook and
+ 	 * pl_warnhook somewhere. err well I think you can but then it breaks our
+ 	 * default warn and die handlers set in plc_perlboot.pl
+ 	 */
+ }
+ 
+ /*
+  * local an individual sig, helper for local_sigs
+  */
+ static void
+ local_sig(HV *hv, SV *tmpsv, const char *signame)
+ {
+ 	HE *he;
+ 	sv_setpvn(tmpsv, signame, strlen(signame));
+ 
+ 	he = hv_fetch_ent(hv, tmpsv, 0, 0);
+ 	if (he)
+ 		/* arrange to restore existing elem */
+ 		save_helem_flags(hv, tmpsv, &HeVAL(he), SAVEf_SETMAGIC);
+ 	else
+ 		/* arrange to delete new elem */
+ 		SAVEHDELETE(hv, tmpsv);
+ }
  
  static SV  *
  plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo,
***************
*** 1986,1995 **** plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo,
  	TDsv = get_sv("_TD", 0);
  	if (!TDsv)
  		elog(ERROR, "couldn't fetch $_TD");
- 
  	save_item(TDsv);			/* local $_TD */
  	sv_setsv(TDsv, td);
  
  	PUSHMARK(sp);
  	EXTEND(sp, tg_trigger->tgnargs);
  
--- 2041,2051 ----
  	TDsv = get_sv("_TD", 0);
  	if (!TDsv)
  		elog(ERROR, "couldn't fetch $_TD");
  	save_item(TDsv);			/* local $_TD */
  	sv_setsv(TDsv, td);
  
+ 	local_sigs();
+ 
  	PUSHMARK(sp);
  	EXTEND(sp, tg_trigger->tgnargs);
  
*** a/src/pl/plperl/sql/plperl.sql
--- b/src/pl/plperl/sql/plperl.sql
***************
*** 415,417 **** DO $do$ use strict; my $name = "foo"; my $ref = $$name; $do$ LANGUAGE plperl;
--- 415,420 ----
  -- check that we can "use warnings" (in this case to turn a warn into an error)
  -- yields "ERROR:  Useless use of sort in scalar context."
  DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl;
+ 
+ DO $do$ die "SIG ALRM is set: $SIG{'ALRM'}" if($SIG{'ALRM'}); $SIG{'ALRM'} = sub { print "alarm!\n"}; $do$ LANGUAGE plperl;
+ DO $do$ die "SIG ALRM is set: $SIG{'ALRM'}" if($SIG{'ALRM'}); $do$ LANGUAGE plperl;
#23Tim Bunce
Tim.Bunce@pobox.com
In reply to: Alex Hunsaker (#22)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Mon, Aug 08, 2011 at 01:23:08AM -0600, Alex Hunsaker wrote:

On Sun, Aug 7, 2011 at 17:06, Tim Bunce <Tim.Bunce@pobox.com> wrote:

Localizing an individual element of %SIG works fine.
In C that's something like this (untested):

   hv = gv_fetchpv("SIG", 0, SVt_PVHV);
   keysv = ...SV containing "ALRM"...
   he = hv_fetch_ent(hv, keysv, 0, 0);
   if (he) {  /* arrange to restore existing elem */
       save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);
   }
   else {     /* arrange to delete a new elem */
       SAVEHDELETE(hv, keysv);
   }

I played with this a bit... and found yes, it locals them but no it
does not fix the reported problem. After playing with things a bit
more I found even "local $SIG{'ALRM'} = .,..; alarm(1);" still results
in postgres crashing. To wit, local does squat. AFAICT it just resets
the signal handler back to the default with SIG_DFL. (Which in
hindsight I don't know what else I expected it to-do...)

Ah, yes. Hindsight is great. I should have spotted that. Sorry.

So I think for this to be robust we would have to detect what signals
they set and then reset those back to what postgres wants. Doable, but
is it worth it? Anyone else have any bright ideas?

I'm only considering ALRM. At least that's the only one that seems worth
offering some limited support for. The others fall under "don't do that".

After giving it some more thought it seems reasonable to simply force the
SIGALRM handler back to postgres when a plperlu function returns:

pqsignal(SIGALRM, handle_sig_alarm);

Tim.

#24Andrew Dunstan
andrew@dunslane.net
In reply to: Tim Bunce (#23)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/08/2011 05:03 AM, Tim Bunce wrote:

After giving it some more thought it seems reasonable to simply force the
SIGALRM handler back to postgres when a plperlu function returns:

pqsignal(SIGALRM, handle_sig_alarm);

Maybe we need to do this in some more centralized spot. It seems
unlikely that this problem is unique to plperlu, or even just confined
to PLs.

cheers

andrew

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#24)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/08/2011 05:03 AM, Tim Bunce wrote:

After giving it some more thought it seems reasonable to simply force the
SIGALRM handler back to postgres when a plperlu function returns:

pqsignal(SIGALRM, handle_sig_alarm);

Maybe we need to do this in some more centralized spot. It seems
unlikely that this problem is unique to plperlu, or even just confined
to PLs.

No. As I pointed out upthread, the instant somebody changes the SIGALRM
handler to a non-Postgres-aware one, you are already at risk of failure.
Setting it back later is just locking the barn door after the horses
left. Institutionalizing such a non-fix globally is even worse.

regards, tom lane

#26Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#25)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/09/2011 12:22 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 08/08/2011 05:03 AM, Tim Bunce wrote:

After giving it some more thought it seems reasonable to simply force the
SIGALRM handler back to postgres when a plperlu function returns:
pqsignal(SIGALRM, handle_sig_alarm);

Maybe we need to do this in some more centralized spot. It seems
unlikely that this problem is unique to plperlu, or even just confined
to PLs.

No. As I pointed out upthread, the instant somebody changes the SIGALRM
handler to a non-Postgres-aware one, you are already at risk of failure.
Setting it back later is just locking the barn door after the horses
left. Institutionalizing such a non-fix globally is even worse.

So what's your suggestion? I know what you said you'd like, but it
doesn't appear at all practical to me.

cheers

andrew

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#26)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/09/2011 12:22 PM, Tom Lane wrote:

No. As I pointed out upthread, the instant somebody changes the SIGALRM
handler to a non-Postgres-aware one, you are already at risk of failure.
Setting it back later is just locking the barn door after the horses
left. Institutionalizing such a non-fix globally is even worse.

So what's your suggestion? I know what you said you'd like, but it
doesn't appear at all practical to me.

[ shrug... ] Installing a perl module that mucks with the signal
handlers is in the "don't do that" category. A kluge such as you
suggest will not get it out of that category; all it will do is add
useless overhead for people who are following the rules.

regards, tom lane

#28Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#27)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On 08/09/2011 04:32 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 08/09/2011 12:22 PM, Tom Lane wrote:

No. As I pointed out upthread, the instant somebody changes the SIGALRM
handler to a non-Postgres-aware one, you are already at risk of failure.
Setting it back later is just locking the barn door after the horses
left. Institutionalizing such a non-fix globally is even worse.

So what's your suggestion? I know what you said you'd like, but it
doesn't appear at all practical to me.

[ shrug... ] Installing a perl module that mucks with the signal
handlers is in the "don't do that" category. A kluge such as you
suggest will not get it out of that category; all it will do is add
useless overhead for people who are following the rules.

Well, knowing what a given module might do isn't always easy (see
below). I don't much like saying to people "I told you so", especially
when following the advice isn't necessarily straightforward.

After some experimentation, I found that, at least on my system, if LWP
uses Crypt::SSLeay for https requests then it sets an alarm handler, but
if instead it uses IO::Socket::SSL an alarm handler is not set. So the
answer to the OP's original problem is probably "make sure you have
IO::Socket::SSL installed and that Crypt::SSLeay is not installed."

cheers

andrew

#29David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#28)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

On Aug 10, 2011, at 9:44 AM, Andrew Dunstan wrote:

After some experimentation, I found that, at least on my system, if LWP uses Crypt::SSLeay for https requests then it sets an alarm handler, but if instead it uses IO::Socket::SSL an alarm handler is not set. So the answer to the OP's original problem is probably "make sure you have IO::Socket::SSL installed and that Crypt::SSLeay is not installed."

I think I'd also complain via bug-crypt-ssleay@rt.cpan.org that a library ought not to set signal handlers.

Best,

David

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#28)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/09/2011 04:32 PM, Tom Lane wrote:

[ shrug... ] Installing a perl module that mucks with the signal
handlers is in the "don't do that" category. A kluge such as you
suggest will not get it out of that category; all it will do is add
useless overhead for people who are following the rules.

Well, knowing what a given module might do isn't always easy (see
below). I don't much like saying to people "I told you so", especially
when following the advice isn't necessarily straightforward.

I'm not thrilled with it either, but since we have no proposed patch
that would actually make it *safe* for perl modules to muck with the
signal handlers, I see no other alternative. A patch that simply makes
it a shade less unsafe isn't really an improvement, especially when it
has other disadvantages.

regards, tom lane

#31Andres Freund
andres@2ndquadrant.com
In reply to: Hannu Krosing (#1)
Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

Hi,

While debugging an instance of this bug I noticed that plperlu always removes
the SIGFPE handler and sets it to ignore:

andres@awork2:~$ psql -p 5435 -U postgres -h /var/run/postgresql test
Timing is on.
psql (9.1devel, server 9.1.5)
Type "help" for help.

test=# SELECT pg_backend_pid();
pg_backend_pid
----------------
9287

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk
'{print $2}'
0000000001301800
0000000180006287

test=# DO LANGUAGE plperlu $$$$;

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk
'{print $2}'
0000000001301880
0000000180006207

Note the 8'th bit being unset in SigCgt and set in SigIgn. Thats SIGFPE...

Not sure how relevant this really is, but it could cause errors to be
ignored...

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#32Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#31)
Re: plperl sigfpe reset can crash the server

On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:

Hi,

While debugging an instance of this bug I noticed that plperlu always
removes the SIGFPE handler and sets it to ignore:

andres@awork2:~$ psql -p 5435 -U postgres -h /var/run/postgresql test
Timing is on.
psql (9.1devel, server 9.1.5)
Type "help" for help.

test=# SELECT pg_backend_pid();
pg_backend_pid
----------------
9287

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk
'{print $2}'
0000000001301800
0000000180006287

test=# DO LANGUAGE plperlu $$$$;

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk
'{print $2}'
0000000001301880
0000000180006207

Note the 8'th bit being unset in SigCgt and set in SigIgn. Thats SIGFPE...

Not sure how relevant this really is, but it could cause errors to be
ignored...

In fact it can be used to crash the server:
test=# SELECT (-2^31)::int/-1;
ERROR: floating-point exception
DETAIL: An invalid floating-point operation was signaled. This probably means
an out-of-range result or an invalid operation, such as division by zero.
test=# DO LANGUAGE plperl $$$$;
DO
Time: 172.235 ms
test=# SELECT (-2^31)::int/-1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#32)
Re: plperl sigfpe reset can crash the server

Andres Freund <andres@2ndquadrant.com> writes:

On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:

While debugging an instance of this bug I noticed that plperlu always
removes the SIGFPE handler and sets it to ignore:

In fact it can be used to crash the server:

Um ... how exactly can that happen, if the signal is now ignored?

regards, tom lane

#34Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#33)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:

While debugging an instance of this bug I noticed that plperlu always

removes the SIGFPE handler and sets it to ignore:

In fact it can be used to crash the server:

Um ... how exactly can that happen, if the signal is now ignored?

Don't ask me the hard questions at 7 in the morning. I have no clue yet.

I don't see where but something resets SIGFPE before the server crashes. If I
catch the sigfpe with gdb I see:

test=# SELECT pg_backend_pid();
pg_backend_pid
----------------
18084

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn: 0000000001301800
SigCgt: 0000000180006287

test=# SELECT (-2^31)::int/-1;
ERROR: floating-point exception
DETAIL: An invalid floating-point operation was signaled. This probably means
an out-of-range result or an invalid operation, such as division by zero.

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn: 0000000001301800
SigCgt: 0000000180006287

test=# DO LANGUAGE plperl $$$$;

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn: 0000000001301880
SigCgt: 0000000180006207

test=# SELECT (-2^31)::int/-1;

Program received signal SIGFPE, Arithmetic exception.
0x00007f858001f8c6 in int4div (fcinfo=0x7f8581b30320)

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn: 0000000001301800
SigCgt: 0000000180006207

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#35Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#33)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:

While debugging an instance of this bug I noticed that plperlu always

removes the SIGFPE handler and sets it to ignore:

In fact it can be used to crash the server:

Um ... how exactly can that happen, if the signal is now ignored?

My man 2 signal tells me:
"According to POSIX, the behavior of a process is undefined after it ignores
a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or
raise(3)."

Killing the process is a kind of undefined behaviour ;)

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#36Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#35)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 07:19:42 AM Andres Freund wrote:

On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:

While debugging an instance of this bug I noticed that plperlu always

removes the SIGFPE handler and sets it to ignore:

In fact it can be used to crash the server:

Um ... how exactly can that happen, if the signal is now ignored?

My man 2 signal tells me:
"According to POSIX, the behavior of a process is undefined after it
ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by
kill(2) or raise(3)."

Killing the process is a kind of undefined behaviour ;)

And its done explicitly in linux:

In

./arch/x86/kernel/traps.c:
void math_error(struct pt_regs *regs, int error_code, int trapnr)
{
...
force_sig_info(SIGFPE, &info, task);
}

and

./kernel/signal.c:
* Force a signal that the process can't ignore: if necessary
* we unblock the signal and change any SIG_IGN to SIG_DFL.
*
* Note: If we unblock the signal, we always reset it to SIG_DFL,
* since we do not want to have a signal handler that was blocked
* be invoked when user space had explicitly blocked it.
*
* We don't want to have recursive SIGSEGV's etc, for example,
* that is why we also clear SIGNAL_UNKILLABLE.
*/
int
force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
...

Absolutely obvious. Imo sigaction should simply return -1 and set errno to
EINVAL if somebody sets SIGFPE to SIG_IGN then...

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#35)
Re: plperl sigfpe reset can crash the server

Andres Freund <andres@2ndquadrant.com> writes:

Um ... how exactly can that happen, if the signal is now ignored?

My man 2 signal tells me:
"According to POSIX, the behavior of a process is undefined after it ignores
a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or
raise(3)."

So I guess the real question there is: WTF is perl doing setting the
handling to SIG_IGN?

Even if you grant the proposition that perl knows what it's doing in
terms of its internal behavior, which given the above seems doubtful,
it has no business overriding a host application's signal settings
like that.

regards, tom lane

#38Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 07:33:01 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Um ... how exactly can that happen, if the signal is now ignored?

My man 2 signal tells me:
"According to POSIX, the behavior of a process is undefined after it
ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by
kill(2) or raise(3)."

So I guess the real question there is: WTF is perl doing setting the
handling to SIG_IGN?

Even if you grant the proposition that perl knows what it's doing in
terms of its internal behavior, which given the above seems doubtful,
it has no business overriding a host application's signal settings
like that.

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

perl.h also has some tidbits:

/*
* initialise to avoid floating-point exceptions from overflow, etc
*/
#ifndef PERL_FPU_INIT
# ifdef HAS_FPSETMASK
# if HAS_FLOATINGPOINT_H
# include <floatingpoint.h>
# endif
/* Some operating systems have this as a macro, which in turn expands to a
comma
expression, and the last sub-expression is something that gets calculated,
and then they have the gall to warn that a value computed is not used.
Hence
cast to void. */
# define PERL_FPU_INIT (void)fpsetmask(0)
# else
# if defined(SIGFPE) && defined(SIG_IGN) && !defined(PERL_MICRO)
# define PERL_FPU_INIT PL_sigfpe_saved = (Sighandler_t)
signal(SIGFPE, SIG_IGN)
# define PERL_FPU_PRE_EXEC { Sigsave_t xfpe; rsignal_save(SIGFPE,
PL_sigfpe_saved, &xfpe);
# define PERL_FPU_POST_EXEC rsignal_restore(SIGFPE, &xfpe); }
# else
# define PERL_FPU_INIT

# endif
# endif
#endif
#ifndef PERL_FPU_PRE_EXEC
# define PERL_FPU_PRE_EXEC {
# define PERL_FPU_POST_EXEC }
#endif

That doesn't sound very well reasoned and especially not very well tested to
me.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#38)
Re: plperl sigfpe reset can crash the server

Andres Freund <andres@2ndquadrant.com> writes:

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

So there's a gap in the "restore" logic someplace.

perl.h also has some tidbits: ...
That doesn't sound very well reasoned and especially not very well tested to
me.

Time to file a Perl bug?

regards, tom lane

#40Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#39)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

So there's a gap in the "restore" logic someplace.

Well, the logic is not triggering at all in pg's case. Its just used if perl
is exec()ing something...

perl.h also has some tidbits: ...
That doesn't sound very well reasoned and especially not very well tested
to me.

Time to file a Perl bug?

Anybody more involved in the perl community volunteering?

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#41Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#40)
Re: plperl sigfpe reset can crash the server

On 08/24/2012 10:58 AM, Andres Freund wrote:

On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

So there's a gap in the "restore" logic someplace.

Well, the logic is not triggering at all in pg's case. Its just used if perl
is exec()ing something...

perl.h also has some tidbits: ...
That doesn't sound very well reasoned and especially not very well tested
to me.

Time to file a Perl bug?

Anybody more involved in the perl community volunteering?

Just run perlbug and let us know the bug number.

cheers

andrew

#42Andres Freund
andres@2ndquadrant.com
In reply to: Andrew Dunstan (#41)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 05:09:18 PM Andrew Dunstan wrote:

On 08/24/2012 10:58 AM, Andres Freund wrote:

On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

So there's a gap in the "restore" logic someplace.

Well, the logic is not triggering at all in pg's case. Its just used if
perl is exec()ing something...

perl.h also has some tidbits: ...
That doesn't sound very well reasoned and especially not very well
tested to me.

Time to file a Perl bug?

Anybody more involved in the perl community volunteering?

Just run perlbug and let us know the bug number.

https://rt.perl.org/rt3/Public/Bug/Display.html?id=114574

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#43Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#39)
Re: plperl sigfpe reset can crash the server

On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

So there's a gap in the "restore" logic someplace.

perl.h also has some tidbits: ...
That doesn't sound very well reasoned and especially not very well tested
to me.

Time to file a Perl bug?

We probably should workaround that bug anyway given that its a pretty trivial
DOS using only a trusted language and it will take quite some time to push out
newer perl versions even if that bug gets fixed.

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to
work. Is that acceptable?

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#44Alex Hunsaker
badalex@gmail.com
In reply to: Andres Freund (#43)
Re: plperl sigfpe reset can crash the server

On Fri, Aug 24, 2012 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:

We probably should workaround that bug anyway given that its a pretty trivial
DOS using only a trusted language and it will take quite some time to push out
newer perl versions even if that bug gets fixed.

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to
work. Is that acceptable?

Makes sense to me. (I have not looked to see if there is some perl
knob we can flip for this)

#45Andres Freund
andres@2ndquadrant.com
In reply to: Alex Hunsaker (#44)
Re: plperl sigfpe reset can crash the server

On Saturday, August 25, 2012 12:15:00 AM Alex Hunsaker wrote:

On Fri, Aug 24, 2012 at 4:10 PM, Andres Freund <andres@2ndquadrant.com>

wrote:

We probably should workaround that bug anyway given that its a pretty
trivial DOS using only a trusted language and it will take quite some
time to push out newer perl versions even if that bug gets fixed.

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
seems to work. Is that acceptable?

Makes sense to me. (I have not looked to see if there is some perl
knob we can flip for this)

I couldn't find any. After some macro indirection the signal() call ends up
being done unconditionally by a compiled function (Perl_sys_init3) without any
conditions, so I don't think there is much that can be done without changing
perl's source code...

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#43)
Re: plperl sigfpe reset can crash the server

Andres Freund <andres@2ndquadrant.com> writes:

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to
work. Is that acceptable?

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?

regards, tom lane

#47Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#46)
Re: plperl sigfpe reset can crash the server

On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
seems to work. Is that acceptable?

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?

Well. Their expectation simply does not work *at all* because they do
something (setting SIGFPE to SIG_IGN) which is completely ignored on at least
one major platform (x86 linux) for longer than it has git history.

Their math code seems to work around generating such errors, but I find it
rather hard to read (or rather read & understand).

Doing what I proposed admittedly has the issue that we would jump out of perl
code without much ado. I have no idea whats the proper perly way to do so is.
It just seems we should do something...

if (in_perl)
return;

Would be the equivalent of what they want?

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#48Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#46)
Re: plperl sigfpe reset can crash the server

On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
seems to work. Is that acceptable?

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?

In the referenced bug they agree that this is the way forward.

There is the issue of corrupting the perl environment if you manage to
generate a SIGFPE - I couldn't so far - but I see no way other than of
teaching the sigfpe handler to really ignore the error as perl wants.
Not sure if adding such ugliness is acceptable.

The issue that the handler does a longjmp out of external code is a general
problem btw. While pg will probably never create a sigfpe while in anything
critical the same cannot be said about external code.
So anything external with persistent state probably can be made to crash or
similar.

Not sure if there is any real way out of this but making the handler FATAL if
non pg code is running.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#49Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#48)
1 attachment(s)
Re: plperl sigfpe reset can crash the server

On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote:

On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
seems to work. Is that acceptable?

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?

In the referenced bug they agree that this is the way forward.

As nobody has any better ideas here is a patch doing that:

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Reset-SIGFPE-handler-after-plperl-initialization.patchtext/x-patch; charset=UTF-8; name=0001-Reset-SIGFPE-handler-after-plperl-initialization.patchDownload
From c5583861a98c6e5c26961d6346c5b5abc699f90d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 5 Sep 2012 16:04:41 +0200
Subject: [PATCH] Reset SIGFPE handler after plperl initialization

Unfortunately perl resets the sigfpe handler to SIG_IGN which is
bad for two reasons: First, we don't get a nice error message
anymore if a SIGFPE is generated via math on the sql level,
secondly setting SIGFPE to SIG_IGN is strongly discouraged by
posix and invokes undefined behaviour according to it.
At least linux defines this undefined behaviour as resetting the
SIGFPE handler and killing the triggering process.

In perl bug 114574 the perl developers agree that the correct
approach is to just reset the SIGFPE handler.

On some platforms this fixes a server crash with: SELECT (-(2^31))::int/-1;
---
 src/pl/plperl/plperl.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index b31e965..f4b2fa9 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -28,6 +28,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_type.h"
 #include "storage/ipc.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -743,6 +744,21 @@ plperl_init_interp(void)
 			perl_sys_init_done = 1;
 			/* quiet warning if PERL_SYS_INIT3 doesn't use the third argument */
 			dummy_env[0] = NULL;
+
+			/*
+			 * Unfortunately perl resets the sigfpe handler to SIG_IGN which is
+			 * bad for two reasons: First, we don't get a nice error message
+			 * anymore if a SIGFPE is generated via math on the sql level,
+			 * secondly setting SIGFPE to SIG_IGN is strongly discouraged by
+			 * posix and invokes undefined behaviour according to it.
+			 * At least linux defines this undefined behaviour as resetting the
+			 * SIGFPE handler and killing the triggering process.
+			 *
+			 * In perl bug 114574 the perl developers agree that the correct
+			 * approach is to just reset the SIGFPE handler.
+			 */
+			pqsignal(SIGFPE, FloatExceptionHandler);
+
 		}
 	}
 #endif
-- 
1.7.10.4

#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#49)
Re: plperl sigfpe reset can crash the server

Andres Freund <andres@2ndquadrant.com> writes:

On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote:

On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?

In the referenced bug they agree that this is the way forward.

As nobody has any better ideas here is a patch doing that:

OK. Do we want to commit this now, or wait till after 9.2.0?
My feeling is it's probably okay to include in 9.2.0, but I can see
that somebody might want to argue not to. Any objections out there?

regards, tom lane

#51Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#50)
Re: plperl sigfpe reset can crash the server

On Wednesday, September 05, 2012 07:15:52 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote:

On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?

In the referenced bug they agree that this is the way forward.

As nobody has any better ideas here is a patch doing that:

OK. Do we want to commit this now, or wait till after 9.2.0?
My feeling is it's probably okay to include in 9.2.0, but I can see
that somebody might want to argue not to. Any objections out there?

Perhaps unsurprisingly I would argue for including it. I am not saying its a
perfect solution, but not bandaiding seems to open a bigger hole/DOS. Given
that any occurance of SIGFPE inside perl on linux in the last 10 years or so
would have lead to perl (including postgres w. plperl[u]) getting killed with
a somewhat distinctive message and the lack of reports I could find about it
the risk doesn't seem to be too big.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#51)
Re: plperl sigfpe reset can crash the server

Andres Freund <andres@2ndquadrant.com> writes:

On Wednesday, September 05, 2012 07:15:52 PM Tom Lane wrote:

OK. Do we want to commit this now, or wait till after 9.2.0?
My feeling is it's probably okay to include in 9.2.0, but I can see
that somebody might want to argue not to. Any objections out there?

Perhaps unsurprisingly I would argue for including it. I am not saying its a
perfect solution, but not bandaiding seems to open a bigger hole/DOS. Given
that any occurance of SIGFPE inside perl on linux in the last 10 years or so
would have lead to perl (including postgres w. plperl[u]) getting killed with
a somewhat distinctive message and the lack of reports I could find about it
the risk doesn't seem to be too big.

Hearing no objections, committed and back-patched.

regards, tom lane