Possible better pg_ctl start/stop handling?
Hello,
Interesting problem with pg_ctl. We have ran into this consistently as I
am sure a lot of other people have. If PostgreSQL does not get shutdown
correctly, the postmaster.pid file is still in PGDATA. This of course
causing problems starting up (and it should).
However it seems that pg_ctl if issued a stop should be able to remove
the file. Below is a speicifc example:
bash-3.00$ bin/pg_ctl -D data start
pg_ctl: another postmaster may be running; trying to start postmaster anyway
LOG: could not load root certificate file "root.crt": No such file or
directory
DETAIL: Will not verify client certificates.
FATAL: pre-existing shared memory block (key 5432001, ID 19202077) is
still in use
HINT: If you're sure there are no old server processes still running,
remove the shared memory block with the command "ipcclean", "ipcrm", or
just delete the file "postmaster.pid".
pg_ctl: could not start postmaster
Examine the log output.
bash-3.00$ bin/pg_ctl -D data stop
pg_ctl: could not send stop signal (PID: 10180): No such process
bash-3.00$
As we can see pg_ctl knows that the PID does not exist. If the PID does
not exist is it safe to assume that we can remove the file? So that we
may start again?
Sincerely,
Joshua D. Drake
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
"Joshua D. Drake" <jd@commandprompt.com> writes:
FATAL: pre-existing shared memory block (key 5432001, ID 19202077) is
still in use
HINT: If you're sure there are no old server processes still running,
remove the shared memory block with the command "ipcclean", "ipcrm", or
just delete the file "postmaster.pid".
As we can see pg_ctl knows that the PID does not exist. If the PID does
not exist is it safe to assume that we can remove the file? So that we
may start again?
The error message is warning you that there appear to still be live
backends in the data directory, even though the original postmaster
process is gone (crashed?). If that is the case, forcibly starting a
new postmaster is a fine recipe for creating unrecoverable data
corruption. So having pg_ctl auto-remove the file is horribly dangerous
and is NOT going to happen.
How did you get into this state anyway?
regards, tom lane
Tom Lane wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
FATAL: pre-existing shared memory block (key 5432001, ID 19202077) is
still in use
HINT: If you're sure there are no old server processes still running,
remove the shared memory block with the command "ipcclean", "ipcrm", or
just delete the file "postmaster.pid".As we can see pg_ctl knows that the PID does not exist. If the PID does
not exist is it safe to assume that we can remove the file? So that we
may start again?The error message is warning you that there appear to still be live
backends in the data directory, even though the original postmaster
process is gone (crashed?).
Yes I am aware of that. My actual point was that pg_ctl test to see if
the process is alive when you issue the stop. It comes back with the
error that the PID is no longer available to kill.
I was just wondering if we could make pg_ctl a little smarter as all.
If pg_ctl can't start because the pid file exists, test for the
existence of the pid, if the pid does not exist test for the existence
of **any** postgres process (grep? egad...), if none exists overwrite
the pid file and start?
If that is the case, forcibly starting a
new postmaster is a fine recipe for creating unrecoverable data
corruption. So having pg_ctl auto-remove the file is horribly dangerous
and is NOT going to happen.
Please understand my thought was not coming lightly. I recognize very
well (as I have had to deal with customers who have done it) the dangers
here.
How did you get into this state anyway?
Power off on a dev machine ;)
Sincerely,
Joshua D. Drake
regards, tom lane
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
"Joshua D. Drake" <jd@commandprompt.com> writes:
I was just wondering if we could make pg_ctl a little smarter as all.
If pg_ctl can't start because the pid file exists, test for the
existence of the pid, if the pid does not exist test for the existence
of **any** postgres process (grep? egad...), if none exists overwrite
the pid file and start?
This cannot be any smarter than the existing test in the postmaster,
and is most likely to be much stupider.
How did you get into this state anyway?
Power off on a dev machine ;)
Does the dev machine run more than one postmaster? I've occasionally
seen similar issues when restarting a clutch of dev postmasters ---
the kernel may assign a shmem id to one of them that belonged to another
one in the previous cycle, and if you already started that other one
then the second gets confused. 8.0 and up have a test that should deal
correctly with this; what version did you see failing exactly?
regards, tom lane
Power off on a dev machine ;)
Does the dev machine run more than one postmaster?
No.
I've occasionally
seen similar issues when restarting a clutch of dev postmasters ---
the kernel may assign a shmem id to one of them that belonged to another
one in the previous cycle, and if you already started that other one
then the second gets confused. 8.0 and up have a test that should deal
correctly with this; what version did you see failing exactly?
This is on my personal dev machine and I am running 8.1Dev.
Sincerely,
Joshua D. Drake
regards, tom lane
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
"Joshua D. Drake" <jd@commandprompt.com> writes:
Does the dev machine run more than one postmaster?
No.
Hmm, it should be pretty impossible to see this if the machine's just
been rebooted and there are no other postmasters running. If you can
replicate it, could you send along the output of "ipcs -m -a" along
with the contents of the postmaster.pid file? Also, what's the platform
exactly?
regards, tom lane
Tom Lane wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
Does the dev machine run more than one postmaster?
No.
Hmm, it should be pretty impossible to see this if the machine's just
been rebooted
It wasn't a reboot it was a total power loss and then startup.
and there are no other postmasters running. If you can
replicate it, could you send along the output of "ipcs -m -a" along
with the contents of the postmaster.pid file?
I will give it a shot a little later today.
Also, what's the platform
exactly?
FC3.
J
regards, tom lane
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/