postgresql-[any version] from FreeBSD ports - startup problems after crash
Hello !
Server rebooted occasionally after power failure.
And I have stale postmaster.pid file, so postmaster didn't start with error
bill postgres[600]: [1-1] FATAL: file "postmaster.pid" already exists
I think startup script and/or pg_ctl have to be written to check if that
process really exists
and it is postmaster, so DBMS server starts after any hard reboot.
I changed the startup script block
postgresql_command()
{
su -l ${postgresql_user} -c "exec ${command} ${command_args}
${rc_arg}"
}
to
postgresql_cmd()
{
su -l ${postgresql_user} -c "exec ${command} ${command_args}
${rc_arg}"
}
postgresql_command()
{
if [ ".$1" = ".start" ]; then
pidfile="${postgresql_data}/postmaster.pid"
if [ -e ${pidfile} ]; then
#check if postmaster process really exists
pid_fromfile=`head -1 ${pidfile}`
real_pid=`ps ax | grep -v grep | grep postmaster
| grep ${postgresql_data} | awk '{print $1}'`
if [ "x${pid_fromfile}" = "x${real_pid}" ]; then
echo "Postmater for datadir
${postgresql_data} already run with pid $real_pid"
else
#we have stale pidfile, remove it
unlink $pidfile
#and run postmater safely
postgresql_cmd
fi
else
#.pid file not exists, clean startup
postgresql_cmd
fi
else
postgresql_cmd
fi
}
That I hope satisfy all cases with stale .pid file...
--
Ruslan A Dautkhanov
Ruslan A Dautkhanov <rusland@scn.ru> writes:
Server rebooted occasionally after power failure.
And I have stale postmaster.pid file, so postmaster didn't start with error
bill postgres[600]: [1-1] FATAL: file "postmaster.pid" already exists
You probably need a newer postgres version (you didn't say what you are
using) and/or a more carefully written start script.
Your proposed change in the start script is useless --- do you think the
postmaster doesn't check that already? Furthermore, it's actually
dangerous for reasons we need not get into here; suffice to say that
automated removal of that lock file is NOT a good idea.
The problem comes up when the startup timing is just different enough
that the PID belonging to the postmaster in the previous boot cycle now
belongs to the shell that's launching it. The postmaster sees a live
process of the correct userid (ie, postgres) and has to assume that
that's a pre-existing postmaster.
We've fixed this in recent releases by having the postmaster also check
for a match to its parent process ID (getppid). The care in the start
script comes because this only works for one level up. Therefore, you
can't "su -c pg_ctl start ..." because that would create three levels of
postgres-owned processes (shell, pg_ctl, postmaster) and if the PID
count is off by 2 instead of 1 then we still lose. You have to invoke
the postmaster directly, "su -c postmaster ...". (Hm, actually it might
work to do "su -c 'exec pg_ctl ...'" ... I have not tried that.)
regards, tom lane
On Mon, May 15, 2006 at 09:23:33AM -0400, Tom Lane wrote:
We've fixed this in recent releases by having the postmaster also check
for a match to its parent process ID (getppid). The care in the start
script comes because this only works for one level up. Therefore, you
can't "su -c pg_ctl start ..." because that would create three levels of
postgres-owned processes (shell, pg_ctl, postmaster) and if the PID
count is off by 2 instead of 1 then we still lose. You have to invoke
the postmaster directly, "su -c postmaster ...". (Hm, actually it might
work to do "su -c 'exec pg_ctl ...'" ... I have not tried that.)
Except that the shell that's running su would be root, not pgsql, at
least in the case of FreeBSD. The guts of the current port's rc.d file
are:
su -l ${postgresql_user} -c "exec ${command} ${command_args} ${rc_arg}"
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes:
Except that the shell that's running su would be root, not pgsql, at
least in the case of FreeBSD. The guts of the current port's rc.d file
are:
su -l ${postgresql_user} -c "exec ${command} ${command_args} ${rc_arg}"
Yeah, but what's the ${command} ?
If it's pg_ctl then all he's missing is the recent change to check
getppid. If it's execing postmaster directly then maybe we need
another theory.
regards, tom lane
Tom Lane wrote:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
Except that the shell that's running su would be root, not pgsql, at
least in the case of FreeBSD. The guts of the current port's rc.d
file are:su -l ${postgresql_user} -c "exec ${command} ${command_args}
${rc_arg}"Yeah, but what's the ${command} ?
If it's pg_ctl then all he's missing is the recent change to check
getppid. If it's execing postmaster directly then maybe we need
another theory.
It's pg_ctl....
command=${prefix}/bin/pg_ctl
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
On Mon, May 15, 2006 at 02:20:51PM -0500, Larry Rosenman wrote:
Yeah, but what's the ${command} ?
If it's pg_ctl then all he's missing is the recent change to check
getppid. If it's execing postmaster directly then maybe we need
another theory.It's pg_ctl....
command=${prefix}/bin/pg_ctl
http://lnk.nu/freebsd.org/9fu.tmpl is the file in ports CVS.
http://jim.nasby.net/010.pgsql.sh.txt is the file as it exists on one of
my systems.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hello !
Tom Lane wrote:
Ruslan A Dautkhanov <rusland@scn.ru> writes:
Server rebooted occasionally after power failure.
And I have stale postmaster.pid file, so postmaster didn't start with error
bill postgres[600]: [1-1] FATAL: file "postmaster.pid" already existsYou probably need a newer postgres version (you didn't say what you are
using) and/or a more carefully written start script.
I hane FreeBSD 6.0-STABLE
and PostgreSQL 8.1.3 on i386-portbld-freebsd6.0, compiled by GCC cc
(GCC) 3.4.4 [FreeBSD].
As I said, PostgreSQL was built from freebsd ports, so startup script is
also from ports...
Your proposed change in the start script is useless --- do you think the
postmaster doesn't check that already? Furthermore, it's actually
dangerous for reasons we need not get into here; suffice to say that
automated removal of that lock file is NOT a good idea.
I know that this is not a good idea, but if it'll keep startup process
stable,
it have rights to exist...
The problem comes up when the startup timing is just different enough
that the PID belonging to the postmaster in the previous boot cycle now
belongs to the shell that's launching it. The postmaster sees a live
process of the correct userid (ie, postgres) and has to assume that
that's a pre-existing postmaster.We've fixed this in recent releases by having the postmaster also check
for a match to its parent process ID (getppid). The care in the start
script comes because this only works for one level up. Therefore, you
can't "su -c pg_ctl start ..." because that would create three levels of
postgres-owned processes (shell, pg_ctl, postmaster) and if the PID
count is off by 2 instead of 1 then we still lose. You have to invoke
the postmaster directly, "su -c postmaster ...". (Hm, actually it might
work to do "su -c 'exec pg_ctl ...'" ... I have not tried that.)
Thank you for the information, I'll play with it and try to avoid big
if-if-if block in the startup script...
--
best regards,
Ruslan A Dautkhanov