Problems Restarting PostgreSQL Daemon
My server is rebooted infrequently, usually after a kernel upgrade and
on very rare occasions when something causes it to hang. After rebooting I
always have serious issues getting postgresql running again, even though the
startup script is part of the boot sequence. Yesterday was one of those
highly unusual hangs, and I cannot restart the service. I'd like to
understand why.
When I run the Slackware script, '/etc/rc.d/rc.postgresql start' (script
attached), I'm shown a process ID and told the daemon is already running.
For example:
Starting PostgreSQL
15342
PostgreSQL daemon already running
However, there is no process ID 15342, and no postgres running. I manually
removed /tmp/.s.PGSQL.5432 and its log file. Also -- apparently in error --
the .pid file. Makes no difference.
Perhaps there's an error in the script that I'm not seeing (I didn't write
it). Regardless, if I learn why there's a problem I can fix the script and
avoid this delay and hassle restarting postgres after the daemon's been shut
down.
TIA,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Attachments:
rc.postgresqltext/plain; charset=US-ASCII; name=rc.postgresqlDownload
Rich Shepard <rshepard@appl-ecosys.com> writes:
My server is rebooted infrequently, usually after a kernel upgrade and
on very rare occasions when something causes it to hang. After rebooting I
always have serious issues getting postgresql running again, even though the
startup script is part of the boot sequence. Yesterday was one of those
highly unusual hangs, and I cannot restart the service. I'd like to
understand why.
When I run the Slackware script, '/etc/rc.d/rc.postgresql start' (script
attached), I'm shown a process ID and told the daemon is already running.
The short answer is probably "don't use Slackware's startup script".
Some distros have PG start scripts that have had the bugs beaten out
of them, and others not so much.
Perhaps there's an error in the script that I'm not seeing (I didn't write
it). Regardless, if I learn why there's a problem I can fix the script and
avoid this delay and hassle restarting postgres after the daemon's been shut
down.
Have you read the script to see what condition causes it to issue the
mentioned error? I'd imagine that it's looking at some other lockfile
than you think.
regards, tom lane
On Tue, 22 Jul 2008, Tom Lane wrote:
The short answer is probably "don't use Slackware's startup script". Some
distros have PG start scripts that have had the bugs beaten out of them,
and others not so much.
Excellent advice, Tom. I'll take it.
Have you read the script to see what condition causes it to issue the
mentioned error? I'd imagine that it's looking at some other lockfile
than you think.
I tried following the logic, and it appears the issue now is 'invalid data
in PID file "/var/lib/pgsql/data/postmaster.pid" '. If I delete that file,
is it automatically recreated? I'm using /usr/bin/pg_ctl as user postgres.
Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
I tried following the logic, and it appears the issue now is 'invalid data
in PID file "/var/lib/pgsql/data/postmaster.pid" '. If I delete that file,
is it automatically recreated?
Why not just move it and rename it? If it's recreated, great; if not,
you still have the corrupted file on hand to try to fix, no?
Show quoted text
On Tue, Jul 22, 2008 at 11:15 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 22 Jul 2008, Tom Lane wrote:
The short answer is probably "don't use Slackware's startup script". Some
distros have PG start scripts that have had the bugs beaten out of them,
and others not so much.Excellent advice, Tom. I'll take it.
Have you read the script to see what condition causes it to issue the
mentioned error? I'd imagine that it's looking at some other lockfile
than you think.I tried following the logic, and it appears the issue now is 'invalid data
in PID file "/var/lib/pgsql/data/postmaster.pid" '. If I delete that file,
is it automatically recreated? I'm using /usr/bin/pg_ctl as user postgres.Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rich Shepard <rshepard@appl-ecosys.com> writes:
I tried following the logic, and it appears the issue now is 'invalid data
in PID file "/var/lib/pgsql/data/postmaster.pid" '. If I delete that file,
is it automatically recreated? I'm using /usr/bin/pg_ctl as user postgres.
If you're certain there's no postmaster running, it's safe to remove
postmaster.pid. However you really shouldn't have to; the postmaster
is generally able to figure out whether a pidfile is live or not.
The "invalid data" bit is interesting though. It looks like pg_ctl
would produce that error if the pidfile exists but is empty when it
looks. This seems like a race condition hazard, though the odds of
hitting it are tiny. What's in the file exactly?
regards, tom lane
On Tue, 22 Jul 2008, Tom Lane wrote:
If you're certain there's no postmaster running, it's safe to remove
postmaster.pid. However you really shouldn't have to; the postmaster is
generally able to figure out whether a pidfile is live or not.
Tom,
I thought the postmaster knew what was current and what needed to be
replaced, but the process ID in the pidfile did not exist.
The "invalid data" bit is interesting though. It looks like pg_ctl would
produce that error if the pidfile exists but is empty when it looks. This
seems like a race condition hazard, though the odds of hitting it are
tiny. What's in the file exactly?
I deleted the .pid, but still could not get the postmaster running. Then I
'touched' the name so I had an empty file. Made no difference. While pg_ctl
tells me the server is starting, there is no /tmp/.s.PGSQL*, no pidfile, and
no postmaster process.
In the past I've managed to start the postmaster daemon manually, but
today I seem to have it FUBARed.
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On 23/07/2008, Rich Shepard <rshepard@appl-ecosys.com> wrote:
When I run the Slackware script, '/etc/rc.d/rc.postgresql start' (script
attached), I'm shown a process ID and told the daemon is already running.
For example:
Since there are no official Slackware postgres packages
I'd like to ask where that script came from :) and how you
installed postges in the first place. Happy to communicate
of the list if you prefer that.
TIA,
Rich
Cheers,
Andrej
On Wed, 23 Jul 2008, Andrej Ricnik-Bay wrote:
Since there are no official Slackware postgres packages I'd like to ask
where that script came from :) and how you installed postges in the first
place. Happy to communicate of the list if you prefer that.
Andrej,
Unless others consider this topic to be not appropriate for the list, I
don't mind a public conversation. I thought that I attached the script to my
original message; regardless, here's the attribution:
# PostgreSQL startup script for Slackware Linux
# Copyright 2007 Adis Nezirovic <adis _at_ linux.org.ba>
# Licensed under GNU GPL v2
I upgraded postgres manually, not creating and using a Slackware package.
It worked just fine until yesterday's reboot.
Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Rich Shepard <rshepard@appl-ecosys.com> writes:
On Tue, 22 Jul 2008, Tom Lane wrote:
The "invalid data" bit is interesting though. It looks like pg_ctl would
produce that error if the pidfile exists but is empty when it looks. This
seems like a race condition hazard, though the odds of hitting it are
tiny. What's in the file exactly?
I deleted the .pid, but still could not get the postmaster running. Then I
'touched' the name so I had an empty file. Made no difference. While pg_ctl
tells me the server is starting, there is no /tmp/.s.PGSQL*, no pidfile, and
no postmaster process.
Sounds to me like the postmaster tries to start and fails. Look into
the postmaster log. (If the log is going to /dev/null, send it
someplace else...)
regards, tom lane
On 23/07/2008, Rich Shepard <rshepard@appl-ecosys.com> wrote:
Andrej,
Hi Rich,
Unless others consider this topic to be not appropriate for the list, I
don't mind a public conversation. I thought that I attached the script to
my original message; regardless, here's the attribution:
You did - my bad. I usually ignore attachments on mailing-lists,
and did so with yours.
I upgraded postgres manually, not creating and using a Slackware package.
It worked just fine until yesterday's reboot.
Now there's an interesting piece of information :) How long
ago did you upgrade it?
From which version of pg to which version did you upgrade,
and how did you go about it? Chances are indeed that the
postmasters logfile (/var/log/postgres) may hold crucial
information as Tom suggested.
Thanks,
Rich
Cheers,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
On Wed, 23 Jul 2008, Andrej Ricnik-Bay wrote:
Now there's an interesting piece of information :) How long
ago did you upgrade it?
Andrej,
A month ago; June 17th to be exact.
From which version of pg to which version did you upgrade,
From 8.1.13 to 8.3.3.
and how did you go about it? Chances are indeed that the postmasters
logfile (/var/log/postgres) may hold crucial information as Tom suggested.
Well, after digging myself into a hole, I received help here and climbed
out. It was working last week (when I made some entries into my accounting
system and viewed the local version of our web site). However, ...
... something broke during the reboot. From /var/log/postgresql:
FATAL: database files are incompatible with server
DETAIL: The database cluster was initialized with PG_CONTROL_VERSION 812,
but the server was compiled with PG_CONTROL_VERSION 833.
HINT: It looks like you need to initdb.
I still have the old pgsql (8.1.13) still in a non-standard directory. I
had run initdb after cleaning up the upgrade. Should I do so again?
Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Tue, 2008-07-22 at 18:05 -0700, Rich Shepard wrote:
On Wed, 23 Jul 2008, Andrej Ricnik-Bay wrote:
Now there's an interesting piece of information :) How long
ago did you upgrade it?
... something broke during the reboot. From /var/log/postgresql:
FATAL: database files are incompatible with server
DETAIL: The database cluster was initialized with PG_CONTROL_VERSION 812,
but the server was compiled with PG_CONTROL_VERSION 833.
HINT: It looks like you need to initdb.I still have the old pgsql (8.1.13) still in a non-standard directory. I
had run initdb after cleaning up the upgrade. Should I do so again?
It looks to me like your init script just isn't pointing to the 8.3.3
data directory. If you are unsure you can do this:
find / -name PG_VERSION
You likely have 2 or 3 of them. Find the one that says 8.3 and make sure
your start up script points there.
Joshua D. Drake
Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Wed, 23 Jul 2008, Andrej Ricnik-Bay wrote:
Now there's an interesting piece of information :) How long
ago did you upgrade it?
Andrej,
I found the thread in the archives for June of this year.
Re-reading the posted results of running initdb I tried a different
approach to starting the server. Instead of using pg_ctl I used 'postgres -D
/var/lib/pgsql/data &' (while logged in as user postgres, of course.) That
cleaned up a bad shutdown (when I had to reboot the system after it hung),
fixed the missing socket, and replaced the .pid. So, it's up and running
once again.
My question is how best to modify the startup script so the postmaster
fires up when the system is rebooted. I don't see an option to 'su' to
specify the postgres user's password so I can script this. Have you any
recommendation?
Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On 27/07/2008, Rich Shepard <rshepard@appl-ecosys.com> wrote:
Andrej,
Hi Rich,
I found the thread in the archives for June of this year.
Re-reading the posted results of running initdb I tried a different
approach to starting the server. Instead of using pg_ctl I used 'postgres
-D
/var/lib/pgsql/data &' (while logged in as user postgres, of course.) That
cleaned up a bad shutdown (when I had to reboot the system after it hung),
fixed the missing socket, and replaced the .pid. So, it's up and running
once again.My question is how best to modify the startup script so the postmaster
fires up when the system is rebooted. I don't see an option to 'su' to
specify the postgres user's password so I can script this. Have you any
recommendation?
Since Slackware doesn't use the SysV style of inits but default the
easiest way for you to achieve an automatic start-up of postgres
on reboot would be to add something like
if [ -x /etc/rc.d/rc.postgres ]; then
/etc/rc.d/rc.postgres start
fi
to your /etc/rc.d/rc.local
Thanks,
Rich
Cheers,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
Rich Shepard <rshepard@appl-ecosys.com> writes:
My question is how best to modify the startup script so the postmaster
fires up when the system is rebooted. I don't see an option to 'su' to
specify the postgres user's password so I can script this.
Startup scripts invariably run as root, so 'su' isn't going to ask
for a password...
regards, tom lane
On 27/07/2008, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Startup scripts invariably run as root, so 'su' isn't going to ask
for a password...
And it's nothing to worry about because the script he's using
is suing to the postgres user anyway ...
regards, tom lane
Cheerw,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
On Sun, 27 Jul 2008, Andrej Ricnik-Bay wrote:
if [ -x /etc/rc.d/rc.postgres ]; then
/etc/rc.d/rc.postgres start
fi
to your /etc/rc.d/rc.local
Well, that's the problem, Andrej. I have that script, and it worked fine
with postgres-6.x through -8.1, but failed to correctly start the postmaster
after the system reboot.
I can try twiddling with the script; it calls pg_ctl, and that should
work, but apparently something broke last week.
Thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Sat, 26 Jul 2008, Tom Lane wrote:
Startup scripts invariably run as root, so 'su' isn't going to ask for a
password...
Tom,
That occurred to me after I wrote the message. Think that I'll tune the
script to use a command that I know is working with 8.3.3.
Many thanks,
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On 27/07/2008, Rich Shepard <rshepard@appl-ecosys.com> wrote:
Well, that's the problem, Andrej. I have that script, and it worked fine
with postgres-6.x through -8.1, but failed to correctly start the
postmaster after the system reboot.
I thought we had established that this issue was caused by
the current instance pointing at the old installs data directory?
I can try twiddling with the script; it calls pg_ctl, and that should
work, but apparently something broke last week.
That should be quite easy to tweak, really ... my current script
(slightly modified from the one in contrib/startup-scripts) is attached...
You may need to change the dirs in the script yet a bit.
Thanks,
Rich
Cheers,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
Attachments:
On Sun, 27 Jul 2008, Andrej Ricnik-Bay wrote:
I thought we had established that this issue was caused by the current
instance pointing at the old installs data directory?
No, that wasn't the problem.
If I use 'postgres -D /var/lib/pgsql/data &' the postmaster starts
correctly and everything runs as intended. If I use '/etc/rc.d/rc.postgresql
start' I get error messages about the postmaster already running and an
invalid .pid.
That should be quite easy to tweak, really ... my current script (slightly
modified from the one in contrib/startup-scripts) is attached... You may
need to change the dirs in the script yet a bit.
Thank you. I think that for some reason using pg_ctl to start the
postmaster is no longer working here. As I have time, I'll look into why.
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863