Postgres not starting at boot(FreeBSD) - startup script not releasing

Started by Dave [Hawk-Systems]about 24 years ago9 messagesgeneral
Jump to latest
#1Dave [Hawk-Systems]
dave@hawk-systems.com

Try this on for size... recently during a reboot (first in about 3 months for
this particular server) our entire rc.d directory failed to start... after some
hacking of the rc file to output some helpful debuggin, it was apparent that the
010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any
directives thereafter not to be processed.

Running the script manually as root starts the postmaster but doesn't return you
to the command prompt. ^C and checking the errlog shows

Waiting for postmaster starting up..DEBUG: Data Base System is starting up at
Sat Mar 9 17:05:45 2002
DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002
DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002
Fast Shutdown request at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002

Can force it to return to command prompt by adding a "&" and doublt cr

web1# /usr/local/etc/rc.d/010.pgsql.sh start &
[1]: + Suspended (tty output) /usr/local/etc/rc.d/010.pgsql.sh start web1#
web1#
[1]: + Suspended (tty output) /usr/local/etc/rc.d/010.pgsql.sh start web1#
web1#

and postgres stays up and frees the terminal. Output in errlog for this is...

Waiting for postmaster starting up..DEBUG: Data Base System is starting up at
Sat Mar 9 17:07:21 2002
DEBUG: Data Base System was shut down at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System is in production state at Sat Mar 9 17:07:21 2002

No idea what could be causing the script not to function as it is the EXACT same
script as on the other servers we are operating (did a diff just to be sure)

In the interim we removed the script from the startup dir... any ideas as to
why this is occuring?

Installed from port, left the port startup script as is... listed below.
Appreciate any feedback/comments.

Dave

# $FreeBSD: ports/databases/postgresql7/files/pgsql.sh.tmpl,v 1.9 2000/12/11
03:22:07 steve Exp $
#
# For postmaster startup options, edit $PGDATA/postmaster.opts.default
# Preinstalled options are -i -o "-F"

case $1 in
start)
[ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib
[ -x /usr/local/pgsql/bin/pg_ctl ] && {
su -l pgsql -c \
'exec /usr/local/pgsql/bin/pg_ctl -w start > /usr/local/pgsql/errlog
2>&1'
echo -n ' pgsql'
}
;;

stop)
[ -x /usr/local/pgsql/bin/pg_ctl ] && {
exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl -w -m fast stop'
}
;;

status)
[ -x /usr/local/pgsql/bin/pg_ctl ] && {
exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status'
}
;;

*)
echo "usage: `basename $0` {start|stop|status}" >&2
exit 64
;;
esac

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave [Hawk-Systems] (#1)
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing

"Dave" <dave@hawk-systems.com> writes:

DEBUG: Data Base System is starting up at Sat Mar 9 17:05:45 2002
DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002
DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002
Fast Shutdown request at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002

It looks like something is hitting the postmaster with a SIGINT signal
as soon as it starts. Got any idea what might be doing that? It's
not pg_ctl, for sure (unless the "something" is firing your init
script with a 'stop' option). In any case I think you should be looking
for outside agencies, not a problem directly in this init script.

regards, tom lane

#3Dave [Hawk-Systems]
dave@hawk-systems.com
In reply to: Tom Lane (#2)
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing

Sorry, should point out that the stop is resulting from executing a ^c after
running the script manually. Since the script runs... postgres starts, but
from reading the startup script, it is waiting for the pid file to appear before
reporting suscess... and it isn't doing this. Or at least not exiting and
leaving the postmaster running. It just sits there... thus the ^c to regain the
terminal.

opening two terminals, I can run the start script, and while the first terminal
is sitting there waiting for the script to release control, move to the second
terminal and view the results... postmaster running fine, pid file there, all
normal.

if I execute the script with the & behind it, it allows everything through after
entering another <cr> which from what I can see suspends the session which then
clears normally. (making sense?)

Confused still as to the cause or how to rectify.

Dave

Show quoted text

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Sunday, March 10, 2002 11:22 AM
To: Dave
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Postgres not starting at boot(FreeBSD) - startup
script not releasing

"Dave" <dave@hawk-systems.com> writes:

DEBUG: Data Base System is starting up at Sat Mar 9 17:05:45 2002
DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002
DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002
Fast Shutdown request at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002
DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002

It looks like something is hitting the postmaster with a SIGINT signal
as soon as it starts. Got any idea what might be doing that? It's
not pg_ctl, for sure (unless the "something" is firing your init
script with a 'stop' option). In any case I think you should be looking
for outside agencies, not a problem directly in this init script.

regards, tom lane

#4Dave [Hawk-Systems]
dave@hawk-systems.com
In reply to: Dave [Hawk-Systems] (#3)
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing

hold the farm...

Try this on for size... recently during a reboot (first in about 3

months for

this particular server) our entire rc.d directory failed to start...

after some

hacking of the rc file to output some helpful debuggin, it was

apparent that the

010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any
directives thereafter not to be processed.

have you tried manually doing "pg_ctl restart" to see if any problems
pop-up? Maybe it is not a script error, but some other issue with the db
server.

did the following, stopped the server totally... then ran the following.

web5# su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl start'
postmaster successfully started up.
web5# DEBUG: Data Base System is starting up at Sun Mar 10 14:32:46 2002
DEBUG: Data Base System was shut down at Sun Mar 10 14:32:04 2002
DEBUG: Data Base System is in production state at Sun Mar 10 14:32:46 2002

web5#
web5# su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl restart'
Smart Shutdown request at Sun Mar 10 14:33:25 2002
Waiting for postmaster shutting down..................................The Data
Base System is shutting down
..........The Data Base System is shutting down
...The Data Base System is shutting down
....The Data Base System is shutting down
...The Data Base System is shutting down
.........pg_ctl: postmaster does not shut down
web5# The Data Base System is shutting down
The Data Base System is shutting down
The Data Base System is shutting down
The Data Base System is shutting down

Hmmm... check that its still running...

web5# ps -aux | grep pgsql
pgsql 81016 0.0 0.1 628 452 p0 I 2:32PM 0:00.00 /bin/sh /usr/loca
pgsql 81018 0.0 0.3 4080 2404 p0 I 2:32PM 0:00.03 /usr/local/pgsql/
pgsql 81082 0.0 0.4 4508 3008 p0 I 2:33PM 0:00.03 /usr/local/pgsql/
pgsql 81083 0.0 0.4 4556 3364 p0 I 2:33PM 0:00.06 /usr/local/pgsql/
web5#

ok, lets try and use the rc.d script...

web5# /usr/local/etc/rc.d/010* stop
Fast Shutdown request at Sun Mar 10 14:37:28 2002
Aborting any active transaction...
Waiting for postmaster shutting down..FATAL 1: The system is shutting down
FATAL 1: The system is shutting down
NOTICE: AbortTransaction and not in in-progress state
.NOTICE: AbortTransaction and not in in-progress state
DEBUG: Data Base System shutting down at Sun Mar 10 14:37:28 2002
DEBUG: Data Base System shut down at Sun Mar 10 14:37:28 2002
done.
postmaster successfully shut down.
web5#

Thats interesting, perhaps pg_ctl is hosed?

web5# ps -aux | grep pgsql
web5#

Ideas?

Dave

#5Dmitry Morozovsky
marck@rinet.ru
In reply to: Dave [Hawk-Systems] (#1)
Re: Postgres not starting at boot(FreeBSD) - startup script

On Sun, 10 Mar 2002, Dave wrote:

I use the following lines (at /usr/local/etc/rc.d/pgsql.sh)

-- 8< --
#!/bin/sh
PGBIN=/usr/local/pgsql/bin

cmd="$1"
: ${cmd:=start}

case $cmd in
start)
[ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib
[ -x ${PGBIN}/pg_ctl ] && {
echo -n 'pgsql '
su -l pgsql -c \
'[ -d ${PGDATA} ] && exec /usr/local/pgsql/bin/pg_ctl start -s -l ~pgsql/log/errlog'
}
;;

stop)
[ -x ${PGBIN}/pg_ctl ] && {
echo -n 'pgsql '
su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl stop -s -m fast'
}
;;

status)
[ -x ${PGBIN}/pg_ctl ] && {
exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status'
}
;;

*)
echo "usage: `basename $0` {start|stop|status}" >&2
exit 64
;;
esac

-- 8< --

D> Try this on for size... recently during a reboot (first in about 3 months for
D> this particular server) our entire rc.d directory failed to start... after some
D> hacking of the rc file to output some helpful debuggin, it was apparent that the
D> 010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any
D> directives thereafter not to be processed.
D>
D> Running the script manually as root starts the postmaster but doesn't return you
D> to the command prompt. ^C and checking the errlog shows
D>
D> Waiting for postmaster starting up..DEBUG: Data Base System is starting up at
D> Sat Mar 9 17:05:45 2002
D> DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002
D> DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002
D> Fast Shutdown request at Sat Mar 9 17:05:48 2002
D> DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002
D> DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002
D>
D> Can force it to return to command prompt by adding a "&" and doublt cr
D>
D> web1# /usr/local/etc/rc.d/010.pgsql.sh start &
D> [1] 4635
D> web1#
D> [1] + Suspended (tty output) /usr/local/etc/rc.d/010.pgsql.sh start
D> web1#
D>
D> and postgres stays up and frees the terminal. Output in errlog for this is...
D>
D> Waiting for postmaster starting up..DEBUG: Data Base System is starting up at
D> Sat Mar 9 17:07:21 2002
D> DEBUG: Data Base System was shut down at Sat Mar 9 17:05:48 2002
D> DEBUG: Data Base System is in production state at Sat Mar 9 17:07:21 2002
D>
D> No idea what could be causing the script not to function as it is the EXACT same
D> script as on the other servers we are operating (did a diff just to be sure)
D>
D> In the interim we removed the script from the startup dir... any ideas as to
D> why this is occuring?
D>
D> Installed from port, left the port startup script as is... listed below.
D> Appreciate any feedback/comments.
D>
D> Dave
D>
D> # $FreeBSD: ports/databases/postgresql7/files/pgsql.sh.tmpl,v 1.9 2000/12/11
D> 03:22:07 steve Exp $
D> #
D> # For postmaster startup options, edit $PGDATA/postmaster.opts.default
D> # Preinstalled options are -i -o "-F"
D>
D> case $1 in
D> start)
D> [ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib
D> [ -x /usr/local/pgsql/bin/pg_ctl ] && {
D> su -l pgsql -c \
D> 'exec /usr/local/pgsql/bin/pg_ctl -w start > /usr/local/pgsql/errlog
D> 2>&1'
D> echo -n ' pgsql'
D> }
D> ;;
D>
D> stop)
D> [ -x /usr/local/pgsql/bin/pg_ctl ] && {
D> exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl -w -m fast stop'
D> }
D> ;;
D>
D> status)
D> [ -x /usr/local/pgsql/bin/pg_ctl ] && {
D> exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status'
D> }
D> ;;
D>
D> *)
D> echo "usage: `basename $0` {start|stop|status}" >&2
D> exit 64
D> ;;
D> esac
D>
D>
D> ---------------------------(end of broadcast)---------------------------
D> TIP 4: Don't 'kill -9' the postmaster
D>

Sincerely,
D.Marck [DM5020, DM268-RIPE, DM3-RIPN]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------

#6Matthew D. Fuller
fullermd@over-yonder.net
In reply to: Dave [Hawk-Systems] (#1)
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing

On Sun, Mar 10, 2002 at 09:11:11AM -0500 I heard the voice of
Dave, and lo! it spake thus:

Try this on for size... recently during a reboot (first in about 3 months for
this particular server) our entire rc.d directory failed to start... after some
hacking of the rc file to output some helpful debuggin, it was apparent that the
010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any
directives thereafter not to be processed.

At a guess, you've set it up to not automatically trust local users, so
the default options which 'wait' for the server to come up (and "waits"
by having psql try connecting as the postgres user) waits for a long long
time for somebody to give it the password it now requires.

I find that rather annoying, and miss it every time, until the rc script
hangs. Check the options and figure out which one it is you have to take
out, I can't recall offhand.

--
Matthew Fuller (MF4839) | fullermd@over-yonder.net
Unix Systems Administrator | fullermd@futuresouth.com
Specializing in FreeBSD | http://www.over-yonder.net/

"The only reason I'm burning my candle at both ends, is because I
haven't figured out how to light the middle yet"

#7Dave [Hawk-Systems]
dave@hawk-systems.com
In reply to: Matthew D. Fuller (#6)
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing < solved

Bingo! Dumb move. Dropped everything to password a few months back, never had
the occasion to restart after that. Will work on tweaking the pg_hba.conf

Thanks Matthew... if you are ever in Toronto, I owe you a beer.

Dave

Show quoted text

At a guess, you've set it up to not automatically trust local users, so
the default options which 'wait' for the server to come up (and "waits"
by having psql try connecting as the postgres user) waits for a long long
time for somebody to give it the password it now requires.

I find that rather annoying, and miss it every time, until the rc script
hangs. Check the options and figure out which one it is you have to take
out, I can't recall offhand.

--
Matthew Fuller (MF4839) | fullermd@over-yonder.net
Unix Systems Administrator | fullermd@futuresouth.com
Specializing in FreeBSD | http://www.over-yonder.net/

#8Matthew D. Fuller
fullermd@over-yonder.net
In reply to: Dave [Hawk-Systems] (#7)
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing < solved

On Sun, Mar 10, 2002 at 06:11:21PM -0500 I heard the voice of
Dave, and lo! it spake thus:

Bingo! Dumb move. Dropped everything to password a few months back, never had
the occasion to restart after that. Will work on tweaking the pg_hba.conf

FWIW (after a quick glance at the default script and the manpage), "-w"
is the pg_ctl option that makes it wait. I just take it out; it only
takes PG a few seconds to initialize, so it's ready to go long before
something would need to connect to it.

It could also be said that having -w implemented as invoking psql to try
to connect as the DB superuser assuming no password is a rather
inappropriate way of going about it, but that's another can of worms.

--
Matthew Fuller (MF4839) | fullermd@over-yonder.net
Unix Systems Administrator | fullermd@futuresouth.com
Specializing in FreeBSD | http://www.over-yonder.net/

"The only reason I'm burning my candle at both ends, is because I
haven't figured out how to light the middle yet"

#9Chad R. Larson
clarson@eldocomp.com
In reply to: Matthew D. Fuller (#8)
Re: Postgres not starting at boot(FreeBSD) - startup

At 02:54 AM 3/11/2002 , Matthew D. Fuller wrote:

It could also be said that having -w implemented as invoking psql to try
to connect as the DB superuser assuming no password is a rather
inappropriate way of going about it, but that's another can of worms.

It doesn't wait for the PID file to be created (at least, no on our 7.1.2
systems). It attempts to connect to a database using psql, and loops until
that connection is successful. Which it won't be if you've got a
password, because the script will wait for some entity to type the
password, and hang.

My fix here was a hack in pg_ctl, right at the bottom where the script is
looping on a psql attempt to connect to a database to prove the system is
up. I added a "-h localhost" to the psql invocation to force a TCP
connection, and then used "ident" instead of password for the authorization.

-crl
--
Chad R. Larson (CRL22) chad@eldocomp.com
Eldorado Computing, Inc. 602-604-3100
5353 North 16th Street, Suite 400
Phoenix, Arizona 85016-3228