Questions regarding signal handler of postmaster

Started by Tatsuo Ishiiabout 9 years ago5 messages
#1Tatsuo Ishii
ishii@sraoss.co.jp

In postmaster.c signal handler pmdie() calls ereport() and
errmsg_internal(), which could call palloc() then malloc() if
necessary. Because it is possible that pmdie() gets called while
malloc() gets called in postmaster, I think it is possible that a
deadlock situation could occur through an internal locking inside
malloc(). I have not observed the exact case in PostgreSQL but I see a
suspected case in Pgpool-II. In the stack trace #14, malloc() is
called by Pgpool-II. It is interrupted by a signal in #11, and the
signal handler calls malloc() again, and it is stuck at #0.

So my question is, is my concern about PostgreSQL valid?
If so, how can we fix it?

#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007f67fe20ccba in _L_lock_12808 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f67fe20a6b5 in __GI___libc_malloc (bytes=15) at malloc.c:2887
#3 0x00007f67fe21072a in __GI___strdup (s=0x7f67fe305dd8 "/etc/localtime") at strdup.c:42
#4 0x00007f67fe239f51 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1)
at tzset.c:444
#5 0x00007f67fe23a913 in __tz_convert (timer=timer@entry=0x7ffce1c1b7f8,
use_localtime=use_localtime@entry=1, tp=tp@entry=0x7f67fe54bde0 <_tmbuf>) at tzset.c:632
#6 0x00007f67fe2387d1 in __GI_localtime (t=t@entry=0x7ffce1c1b7f8) at localtime.c:42
#7 0x000000000045627b in log_line_prefix (buf=buf@entry=0x7ffce1c1b8d0, line_prefix=<optimized out>,
edata=<optimized out>) at ../../src/utils/error/elog.c:2059
#8 0x000000000045894d in send_message_to_server_log (edata=0x753320 <errordata>)
at ../../src/utils/error/elog.c:2084
#9 EmitErrorReport () at ../../src/utils/error/elog.c:1129
#10 0x0000000000456d8e in errfinish (dummy=<optimized out>) at ../../src/utils/error/elog.c:434
#11 0x0000000000421f57 in die (sig=2) at protocol/child.c:925
#12 <signal handler called>
#13 _int_malloc (av=0x7f67fe546760 <main_arena>, bytes=4176) at malloc.c:3302
#14 0x00007f67fe20a6c0 in __GI___libc_malloc (bytes=4176) at malloc.c:2891

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: Questions regarding signal handler of postmaster

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

In postmaster.c signal handler pmdie() calls ereport() and
errmsg_internal(), which could call palloc() then malloc() if
necessary. Because it is possible that pmdie() gets called while
malloc() gets called in postmaster, I think it is possible that a
deadlock situation could occur through an internal locking inside
malloc().

But we keep signals blocked almost all the time in the postmaster,
so in reality no signal handler can interrupt anything except the
select() wait call. People complain about that coding technique
all the time, but no one has presented any reason to believe that
it's broken.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tatsuo Ishii (#1)
Re: Questions regarding signal handler of postmaster

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tatsuo Ishii
In postmaster.c signal handler pmdie() calls ereport() and
errmsg_internal(), which could call palloc() then malloc() if necessary.
Because it is possible that pmdie() gets called while
malloc() gets called in postmaster, I think it is possible that a deadlock
situation could occur through an internal locking inside malloc(). I have
not observed the exact case in PostgreSQL but I see a suspected case in
Pgpool-II. In the stack trace #14, malloc() is called by Pgpool-II. It is
interrupted by a signal in #11, and the signal handler calls malloc() again,
and it is stuck at #0.

I encountered that problem with postmaster and fixed it in 9.4.0 (it's not back-patched to earlier releases because it's relatively complex).

/messages/by-id/20DAEA8949EC4E2289C6E8E58560DEC0@maumau

[Excerpt from 9.4 release note]
During crash recovery or immediate shutdown, send uncatchable termination signals (SIGKILL) to child processes that do not shut down promptly (MauMau, Álvaro Herrera)
This reduces the likelihood of leaving orphaned child processes behind after postmaster shutdown, as well as ensuring that crash recovery can proceed if some child processes have become “stuck”.

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Tsunakawa, Takayuki (#3)
Re: Questions regarding signal handler of postmaster

I encountered that problem with postmaster and fixed it in 9.4.0 (it's not back-patched to earlier releases because it's relatively complex).

/messages/by-id/20DAEA8949EC4E2289C6E8E58560DEC0@maumau

[Excerpt from 9.4 release note]
During crash recovery or immediate shutdown, send uncatchable termination signals (SIGKILL) to child processes that do not shut down promptly (MauMau, Álvaro Herrera)
This reduces the likelihood of leaving orphaned child processes behind after postmaster shutdown, as well as ensuring that crash recovery can proceed if some child processes have become “stuck”.

Looks wild to me:-) I hope there exists better way to solve the problem...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Tom Lane (#2)
Re: Questions regarding signal handler of postmaster

But we keep signals blocked almost all the time in the postmaster,
so in reality no signal handler can interrupt anything except the
select() wait call. People complain about that coding technique
all the time, but no one has presented any reason to believe that
it's broken.

Ok, there seems no better solution than always blocking signals.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers