How to shoot yourself in the foot: kill -9 postmaster
I have spent several days now puzzling over the corrupted WAL logfile
that Scott Parish was kind enough to send me from a 7.1beta4 crash.
It looks a lot like two different series of transactions were getting
written into the same logfile. I'd been digging like mad in the WAL
code to try to explain this as a buffer-management logic error, but
after a fresh exchange of info it turns out that I was barking up the
wrong tree. There *were* two different series of transactions.
Specifically, here's what happened:
1. Scott (or actually his associate) shut down and restarted the
postmaster using the /etc/rc.d/init.d/pgsql script that ships with
our RPMs. That script shuts down the old postmaster with
killproc postmaster
It turns out that at least on Scott's machine (RedHat 6.1), the default
kill level for the killproc function is kill -9. (This is clearly a bad
bug in the init script, but I digress.)
2. So, the old postmaster was killed with kill -9, but its child
backends were still running. The new postmaster will start up
successfully because it'll think the old postmaster crashed, and
so it will go through the usual recovery procedure.
3. Now we have two sets of backends running in different shmem blocks
(7.0 might have choked on that part, but 7.1 doesn't care) and running
different sets of transactions. But they're writing to the same WAL
log. Result: guaranteed corruption of the log.
It actually took two iterations of this to expose the bug: the third
attempted postmaster start went looking for the checkpoint record last
written by the second one, which meanwhile had got overwritten by
activity of the first backend set.
Now, killing the postmaster -9 and not cleaning up the backends has
always been a good way to shoot yourself in the foot, but up to now the
worst thing that was likely to happen to you was isolated corruption in
specific tables. In the brave new world of WAL the stakes are higher,
because the system will refuse to start up if it finds a corrupted
checkpoint record. Clueless admins who resort to kill -9 as a routine
admin tool *will* lose their databases. Moreover, the init scripts
that are running around now are dangerous weapons if used with 7.1.
I think we need a stronger interlock to prevent this scenario, but I'm
unsure what it should be. Ideas?
regards, tom lane
At 3/5/2001 04:30 PM, you wrote:
Now, killing the postmaster -9 and not cleaning up the backends has
always been a good way to shoot yourself in the foot, but up to now the
worst thing that was likely to happen to you was isolated corruption in
specific tables. In the brave new world of WAL the stakes are higher,
because the system will refuse to start up if it finds a corrupted
checkpoint record. Clueless admins who resort to kill -9 as a routine
admin tool *will* lose their databases. Moreover, the init scripts
that are running around now are dangerous weapons if used with 7.1.I think we need a stronger interlock to prevent this scenario, but I'm
unsure what it should be. Ideas?
Is there anyway to see if the other processes (child) have a lock on the
log file?
On a lot of systems, when a daemon starts, will record the PID in a file so
it/'the admin' can do a 'shutdown' script with the PID listed.
Can child processes list themselves like child.PID in a configurable
directory, and have the starting process look for all of these and shut the
"orphaned" child processes down?
Just thoughts...
Thomas
* Tom Lane <tgl@sss.pgh.pa.us> [010305 14:51] wrote:
I think we need a stronger interlock to prevent this scenario, but I'm
unsure what it should be. Ideas?
Re having multiple postmasters active by accident.
The sysV IPC stuff has some hooks in it that may help you.
One idea is to check the 'struct shmid_ds' feild 'shm_nattch',
basically at startup if it's not 1 (or 0) then you have more than
one postgresql instance messing with it and it should not proceed.
I'd also suggest looking into using sysV semaphores and the semundo
stuff, afaik it can be used to track the number of consumers of
a reasource.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
Tom Lane wrote:
checkpoint record. Clueless admins who resort to kill -9 as a routine
admin tool *will* lose their databases. Moreover, the init scripts
that are running around now are dangerous weapons if used with 7.1.
Thanks for the headsup, Tom. Time to nix killproc and do something
cleaner -- compatible, but cleaner. I'll have to research what the
defaults are for later RH's -- but, as 6.1 is one of my target platforms
at this time, I have to fix that issue for sure.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes:
Thanks for the headsup, Tom. Time to nix killproc and do something
cleaner -- compatible, but cleaner.
As far as I could tell from the 6.1 scripts, it would work to do
killproc postmaster -TERM
The problem is just that killproc has an overenthusiastic default...
regards, tom lane
killproc should send a kill -15 to the process, wait a few seconds for
it to exit. If it does not, try kill -1, and if that doesn't kill it,
then kill -9.
Tom Lane wrote:
checkpoint record. Clueless admins who resort to kill -9 as a routine
admin tool *will* lose their databases. Moreover, the init scripts
that are running around now are dangerous weapons if used with 7.1.Thanks for the headsup, Tom. Time to nix killproc and do something
cleaner -- compatible, but cleaner. I'll have to research what the
defaults are for later RH's -- but, as 6.1 is one of my target platforms
at this time, I have to fix that issue for sure.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Lamar Owen <lamar.owen@wgcr.org> writes:
Thanks for the headsup, Tom. Time to nix killproc and do something
cleaner -- compatible, but cleaner.As far as I could tell from the 6.1 scripts, it would work to do
killproc postmaster -TERM
Yes, amazing it has a -9 default.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
killproc should send a kill -15 to the process, wait a few seconds for
it to exit. If it does not, try kill -1, and if that doesn't kill it,
then kill -9.
Tell it to the Linux people ... this is their boot-script code we're
talking about.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
killproc should send a kill -15 to the process, wait a few seconds for
it to exit. If it does not, try kill -1, and if that doesn't kill it,
then kill -9.Tell it to the Linux people ... this is their boot-script code we're
talking about.
RedHat, in particular. I can't vouch for any others.
On my RH 6.2 box, with initscripts-5.00-1 loaded, here's what killproc
does if no killlevel is set (even though a default $killlevel is set to
-9, it's not used in this code):
($pid is the pid of the proc to kill, $base is the name of the proc,
etc)
if [ "$notset" = "1" ] ; then
if ps h $pid>/dev/null 2>&1; then
# TERM first, then KILL if not dead
kill -TERM $pid
usleep 100000
if ps h $pid >/dev/null 2>&1 ; then
sleep 1
if ps h $pid >/dev/null 2>&1 ; then
sleep 3
if ps h $pid >/dev/null 2>&1 ; then
kill -KILL $pid
fi
fi
fi
fi
ps h $pid >/dev/null 2>&1
RC=$?
[ $RC -eq 0 ] && failure "$base shutdown" || success "$base
shutdown"
RC=$((! $RC))
# use specified level only
else
if ps h $pid >/dev/null 2>&1; then
kill $killlevel $pid
RC=$?
[ $RC -eq 0 ] && success "$base $killlevel" || failure "$base
$killlevel"
fi
fi
Is 6.1 this different from 6.2? This code on the surface seems
reasonable to me -- am I missing something? The 6.2 code (found in
/etc/rc.d/init.d/functions, for those who might not know where to find
killproc) sets a default killlevel but never uses it -- ignorant but not
stupid.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
if [ "$notset" = "1" ] ; then
if ps h $pid>/dev/null 2>&1; then
# TERM first, then KILL if not dead
kill -TERM $pid
usleep 100000
if ps h $pid >/dev/null 2>&1 ; then
sleep 1
if ps h $pid >/dev/null 2>&1 ; then
sleep 3
if ps h $pid >/dev/null 2>&1 ; then
kill -KILL $pid
fi
fi
fi
fi
Yes, this seems like the proper way to do it.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, Mar 05, 2001 at 08:55:41PM -0500, Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
killproc should send a kill -15 to the process, wait a few seconds for
it to exit. If it does not, try kill -1, and if that doesn't kill it,
then kill -9.Tell it to the Linux people ... this is their boot-script code we're
talking about.
Not to be a zealot, but this isn't _Linux_ boot-script code, it's
_Red Hat_ boot-script code. Red Hat would like for us all to confuse
the two, but they jes' ain't the same. (As a rule of thumb, where it
works right, credit Linux; where it doesn't, blame Red Hat. :-)
Nathan Myers
ncm@zembu.com
Tom Lane wrote:
Now, killing the postmaster -9 and not cleaning up the backends has
always been a good way to shoot yourself in the foot, but up to now the
worst thing that was likely to happen to you was isolated corruption in
specific tables. In the brave new world of WAL the stakes are higher,
because the system will refuse to start up if it finds a corrupted
checkpoint record. Clueless admins who resort to kill -9 as a routine
admin tool *will* lose their databases. Moreover, the init scripts
that are running around now are dangerous weapons if used with 7.1.I think we need a stronger interlock to prevent this scenario, but I'm
unsure what it should be. Ideas?
Seems the simplest way is to inhibit starting postmaster
if the pid file exists.
Another way is to use flock() if flock() is available.
We could flock() the pid file so that another postmaster
could detect the lock of the file.
Regards,
Hiroshi Inoue
Bruce Momjian wrote:
# TERM first, then KILL if not dead
Yes, this seems like the proper way to do it.
Now to verify that 6.1 is the same....or different.... Hmmmm.... The
mirrors of ftp.redhat.com (and, in fact, RedHat.com itself) no longer
have the updates or the original for 6.1's initscripts-4.70 package.
Can a RedHat 6.1 user (using as close as possible to 6.1's release
initscripts package) send me a copy of /etc/rc.d/init.d/functions, or
verify how that initscripts package defines killproc? I cannot at this
moment locate my RH 6.1 SRPMS CD. Found my RH _4_.1 CD, but that's just
a _little_ old :-).
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
Tom Lane wrote:
I think we need a stronger interlock to prevent this scenario, but I'm
unsure what it should be. Ideas?
Seems the simplest way is to inhibit starting postmaster
if the pid file exists.
Then we're unable to recover from a crash without manual intervention.
The tricky part of this is not to give up the ability to restart when
there *has* been a crash.
Another way is to use flock() if flock() is available.
We could flock() the pid file so that another postmaster
could detect the lock of the file.
This would only work if every backend is holding flock on the file,
which would mean they'd all have to keep it open all the time. Kind
of annoying to use up that many file descriptors on it. Might be the
best answer though; I haven't thought of anything I like better...
regards, tom lane
Nathan Myers wrote:
Not to be a zealot, but this isn't _Linux_ boot-script code, it's
_Red Hat_ boot-script code. Red Hat would like for us all to confuse
the two, but they jes' ain't the same. (As a rule of thumb, where it
works right, credit Linux; where it doesn't, blame Red Hat. :-)
So we're going to credit Linux for PostgreSQL being shipped as part of
the RedHat distribution since RH 5.0, then? :-0
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes:
Is 6.1 this different from 6.2?
Scott sent me a copy of /etc/init.d/functions from his box, and it has
largely the same behavior (I hadn't read the whole code to notice that
it doesn't use the default killlevel...). What's actually happening
here is that the init script sends SIGTERM, and then SIGKILL four
seconds later if the postmaster hasn't shut down yet. Unfortunately,
unless your clients are very short-lived four seconds isn't going to
be enough for a "polite" shutdown. (It's pretty marginal even for
an impolite one, since a checkpoint will take at least a couple of
seconds.)
However, with an explicit kill level that doesn't happen: you get one
signal of the specified value, no more. Possibly it would be better for
the init script to send SIGINT (forcibly disconnect clients) instead of
SIGTERM, however. So I'm now leaning to "killproc postmaster -INT".
regards, tom lane
Tom Lane wrote:
The tricky part of this is not to give up the ability to restart when
there *has* been a crash.
But kill -9 effectively _is_ an admin-initiated crash.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes:
Tom Lane wrote:
The tricky part of this is not to give up the ability to restart when
there *has* been a crash.
But kill -9 effectively _is_ an admin-initiated crash.
Yeah, but only a partial crash. If the admin finishes the job by
killing the backends too, we're fine. Postmaster down, backends alive
is not a scenario we're currently prepared for. We need a way to plug
that gap.
regards, tom lane
Tom Lane wrote:
However, with an explicit kill level that doesn't happen: you get one
signal of the specified value, no more. Possibly it would be better for
the init script to send SIGINT (forcibly disconnect clients) instead of
SIGTERM, however. So I'm now leaning to "killproc postmaster -INT".
Ok, since I can't seem to count on killproc's exact behavior, istm that
I can:
killproc postmaster -INT
wait some number of seconds
if postmaster still up
killproc postmaster -TERM
wait some number of seconds
if postmaster STILL up
killproc postmaster #and let the grim reaper do its dirty work.
After all, the system shutdown is relying on this script to properly and
thoroughly shut things down, or it WILL do the 'kill -9
pid-of-postmaster' for you.
Now, what's a good delay here? Or is there a better metric that a
simple delay? After all, I want to avoid the kill -9 unless we have an
emergency hard lock situation -- what's a good indicator of the backend
fleet of processes actually _doing_ something? Or should I key on an
indicator of processor speed (Linux does provide a nice bogus metric
known as BogoMIPS for such a purpose)? The last thing I want to do is
wait too long on some platforms and not long enough on others.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes:
The last thing I want to do is
wait too long on some platforms and not long enough on others.
The difficulty is to know how long the final checkpoint will take.
This depends on (at least) your hard disk speed and the number of
dirty buffers, so I think you're going to have some difficulty
estimating it with any reliability. BogoMIPS won't help, for sure.
However, if you do SIGINT and then wait a few seconds, you can be fairly
sure that all the extant backends are dead (if not frozen up...) and
that the checkpoint is in progress. That may be about the best you can
do.
I do not agree that this script should take it on itself to kill -9 the
postmaster. Please note that the reason we're having this discussion at
all is that the init script may be used for purposes other than system
shutdown. So the argument that "it's going to happen anyway" is wrong.
regards, tom lane