postgresql-[any version] from FreeBSD ports - startup problems after crash

Started by Ruslan A Dautkhanovalmost 20 years ago7 messagesbugs

rusland@scn.ru

almost 20 years ago

Hello !

Server rebooted occasionally after power failure.
And I have stale postmaster.pid file, so postmaster didn't start with error
bill postgres[600]: [1-1] FATAL: file "postmaster.pid" already exists

I think startup script and/or pg_ctl have to be written to check if that
process really exists
and it is postmaster, so DBMS server starts after any hard reboot.

I changed the startup script block

postgresql_command()
{
su -l ${postgresql_user} -c "exec ${command} ${command_args}
${rc_arg}"
}

postgresql_cmd()
{
su -l ${postgresql_user} -c "exec ${command} ${command_args}
${rc_arg}"
}
postgresql_command()
{
if [ ".$1" = ".start" ]; then
pidfile="${postgresql_data}/postmaster.pid"
if [ -e ${pidfile} ]; then
#check if postmaster process really exists
pid_fromfile=`head -1 ${pidfile}`
real_pid=`ps ax | grep -v grep | grep postmaster
| grep ${postgresql_data} | awk '{print $1}'`
if [ "x${pid_fromfile}" = "x${real_pid}" ]; then
echo "Postmater for datadir
${postgresql_data} already run with pid $real_pid"
else
#we have stale pidfile, remove it
unlink $pidfile
#and run postmater safely
postgresql_cmd
fi
else
#.pid file not exists, clean startup
postgresql_cmd
fi
else
postgresql_cmd
fi
}

That I hope satisfy all cases with stale .pid file...

--
Ruslan A Dautkhanov

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: Ruslan A Dautkhanov (#1)

Re: postgresql-[any version] from FreeBSD ports - startup problems after crash

Ruslan A Dautkhanov <rusland@scn.ru> writes:

Server rebooted occasionally after power failure.
And I have stale postmaster.pid file, so postmaster didn't start with error
bill postgres[600]: [1-1] FATAL: file "postmaster.pid" already exists

You probably need a newer postgres version (you didn't say what you are
using) and/or a more carefully written start script.

Your proposed change in the start script is useless --- do you think the
postmaster doesn't check that already? Furthermore, it's actually
dangerous for reasons we need not get into here; suffice to say that
automated removal of that lock file is NOT a good idea.

The problem comes up when the startup timing is just different enough
that the PID belonging to the postmaster in the previous boot cycle now
belongs to the shell that's launching it. The postmaster sees a live
process of the correct userid (ie, postgres) and has to assume that
that's a pre-existing postmaster.

We've fixed this in recent releases by having the postmaster also check
for a match to its parent process ID (getppid). The care in the start
script comes because this only works for one level up. Therefore, you
can't "su -c pg_ctl start ..." because that would create three levels of
postgres-owned processes (shell, pg_ctl, postmaster) and if the PID
count is off by 2 instead of 1 then we still lose. You have to invoke
the postmaster directly, "su -c postmaster ...". (Hm, actually it might
work to do "su -c 'exec pg_ctl ...'" ... I have not tried that.)

regards, tom lane

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 20 years ago

In reply to: Tom Lane (#2)

Re: postgresql-[any version] from FreeBSD ports - startup problems after crash

On Mon, May 15, 2006 at 09:23:33AM -0400, Tom Lane wrote:

We've fixed this in recent releases by having the postmaster also check
for a match to its parent process ID (getppid). The care in the start
script comes because this only works for one level up. Therefore, you
can't "su -c pg_ctl start ..." because that would create three levels of
postgres-owned processes (shell, pg_ctl, postmaster) and if the PID
count is off by 2 instead of 1 then we still lose. You have to invoke
the postmaster directly, "su -c postmaster ...". (Hm, actually it might
work to do "su -c 'exec pg_ctl ...'" ... I have not tried that.)

Except that the shell that's running su would be root, not pgsql, at
least in the case of FreeBSD. The guts of the current port's rc.d file
are:

su -l ${postgresql_user} -c "exec ${command} ${command_args} ${rc_arg}"
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: Jim Nasby (#3)

Re: postgresql-[any version] from FreeBSD ports - startup problems after crash

"Jim C. Nasby" <jnasby@pervasive.com> writes:

Except that the shell that's running su would be root, not pgsql, at
least in the case of FreeBSD. The guts of the current port's rc.d file
are:

su -l ${postgresql_user} -c "exec ${command} ${command_args} ${rc_arg}"

Yeah, but what's the ${command} ?

If it's pg_ctl then all he's missing is the recent change to check
getppid. If it's execing postmaster directly then maybe we need
another theory.

regards, tom lane

Larry Rosenman

ler@lerctr.org

almost 20 years ago

In reply to: Tom Lane (#4)

Re: postgresql-[any version] from FreeBSD ports - startup problems after crash

Tom Lane wrote:

"Jim C. Nasby" <jnasby@pervasive.com> writes:

Except that the shell that's running su would be root, not pgsql, at
least in the case of FreeBSD. The guts of the current port's rc.d
file are:

su -l ${postgresql_user} -c "exec ${command} ${command_args}
${rc_arg}"

Yeah, but what's the ${command} ?

If it's pg_ctl then all he's missing is the recent change to check
getppid. If it's execing postmaster directly then maybe we need
another theory.

It's pg_ctl....

command=${prefix}/bin/pg_ctl

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 20 years ago

In reply to: Larry Rosenman (#5)

Re: postgresql-[any version] from FreeBSD ports - startup problems after crash

On Mon, May 15, 2006 at 02:20:51PM -0500, Larry Rosenman wrote:

Yeah, but what's the ${command} ?

If it's pg_ctl then all he's missing is the recent change to check
getppid. If it's execing postmaster directly then maybe we need
another theory.

It's pg_ctl....

command=${prefix}/bin/pg_ctl

http://lnk.nu/freebsd.org/9fu.tmpl is the file in ports CVS.
http://jim.nasby.net/010.pgsql.sh.txt is the file as it exists on one of
my systems.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

Ruslan A Dautkhanov

rusland@scn.ru

almost 20 years ago

In reply to: Tom Lane (#2)

Re: postgresql-[any version] from FreeBSD ports - startup

Hello !

Tom Lane wrote:

Ruslan A Dautkhanov <rusland@scn.ru> writes:

Server rebooted occasionally after power failure.
And I have stale postmaster.pid file, so postmaster didn't start with error
bill postgres[600]: [1-1] FATAL: file "postmaster.pid" already exists

You probably need a newer postgres version (you didn't say what you are
using) and/or a more carefully written start script.

I hane FreeBSD 6.0-STABLE
and PostgreSQL 8.1.3 on i386-portbld-freebsd6.0, compiled by GCC cc
(GCC) 3.4.4 [FreeBSD].

As I said, PostgreSQL was built from freebsd ports, so startup script is
also from ports...

Your proposed change in the start script is useless --- do you think the
postmaster doesn't check that already? Furthermore, it's actually
dangerous for reasons we need not get into here; suffice to say that
automated removal of that lock file is NOT a good idea.

I know that this is not a good idea, but if it'll keep startup process
stable,
it have rights to exist...

The problem comes up when the startup timing is just different enough
that the PID belonging to the postmaster in the previous boot cycle now
belongs to the shell that's launching it. The postmaster sees a live
process of the correct userid (ie, postgres) and has to assume that
that's a pre-existing postmaster.

We've fixed this in recent releases by having the postmaster also check
for a match to its parent process ID (getppid). The care in the start
script comes because this only works for one level up. Therefore, you
can't "su -c pg_ctl start ..." because that would create three levels of
postgres-owned processes (shell, pg_ctl, postmaster) and if the PID
count is off by 2 instead of 1 then we still lose. You have to invoke
the postmaster directly, "su -c postmaster ...". (Hm, actually it might
work to do "su -c 'exec pg_ctl ...'" ... I have not tried that.)

Thank you for the information, I'll play with it and try to avoid big
if-if-if block in the startup script...

--
best regards,
Ruslan A Dautkhanov