Cannot Start Postgres After System Boot
For reasons I do not understand, the Slackware start-up file for postgres
(/etc/rc.d/rc.postgresql) fails to work properly after I reboot the system.
(Reboots normally occur only after a kernel upgrade or with a hardware
failure that crashes the system.)
Trying to restart the system manually (su postgres -c 'postgres -D
/var/lib/pgsql/data &') regardless of the presence of /tmp/.s.PGSQL.5432
and /var/lib/pgsql/postmaster.pid. Here's what I see:
[rshepard@salmo ~]$ su postgres -c 'postgres -D /var/lib/pgsql/data &'
Password:
[rshepard@salmo ~]$ LOG: could not bind IPv4 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a
few seconds and retry.
WARNING: could not create listen socket for "localhost"
FATAL: could not create any TCP/IP sockets
If someone would be kind enough to point out what I'm doing incorrectly
(e.g., removing /tmp/.s.PGSQL.5432 and postmaster.pid when the startup
process complains they're not right) I'll save this information for the next
time. I can also provide the 'start' section of the Slackware init file so I
could learn why it's not working properly.
TIA,
Rich
On 21 October 2010 11:53, Rich Shepard <rshepard@appl-ecosys.com> wrote:
If someone would be kind enough to point out what I'm doing incorrectly
(e.g., removing /tmp/.s.PGSQL.5432 and postmaster.pid when the startup
process complains they're not right) I'll save this information for the next
time. I can also provide the 'start' section of the Slackware init file so I
could learn why it's not working properly.
Please do - provide the section, I mean.
TIA,
Rich
Cheers,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
On Thu, 21 Oct 2010, Andrej wrote:
Please do - provide the section, I mean.
Andrej,
The entire script is attached. It's only 2588 bytes.
Also, when there is no postmaster.pid or .s.PGSQL.5432 (and its lock file)
are these recreated automagically when postgres is properly loaded, or do I
need to do something first?
Many thanks,
Rich
Attachments:
rc.postgresqltext/plain; charset=US-ASCII; name=rc.postgresqlDownload
Rich Shepard <rshepard@appl-ecosys.com> writes:
The entire script is attached. It's only 2588 bytes.
Personally, I'd drop all the machinations with checking the pidfile or
removing old socket files. The postmaster is fully capable of doing
those things for itself, and is much less likely to do them mistakenly
than this script is. In particular, I wonder whether the script's
refusal to start if the pidfile already exists accounts for your
report that it fails to auto-restart after a reboot.
IOW, this:
else # remove old socket, if it exists and no daemon is running.
if [ ! -f $DATADIR/$PIDFILE ]; then
rm -f /tmp/.s.PGSQL.5432
rm -f /tmp/.s.PGSQL.5432.lock
# pg_ctl start -w -l $LOGFILE -D $DATADIR
su postgres -c 'postgres -D /var/lib/pgsql/data &'
exit 0
else
echo "PostgreSQL daemon was not properly shut down"
echo "Please remove stale pid file $DATADIR/$PIDFILE"
exit 7
fi
fi
could be reduced to just:
else
su postgres -c 'postgres -D /var/lib/pgsql/data &'
exit 0
fi
I'd also strongly recommend making that be "su - postgres -c ..."
rather than the way it is now; it's failing to ensure that the
postmaster is started with the postgres account's login settings.
I'm not sure about your comment that manual start attempts fail with
LOG: could not bind IPv4 socket: Address already in use
It's pretty hard to believe that that could occur on a freshly
booted system unless the TCP port was in fact already in use ---
ie, either there *is* a running postmaster, or something else is
using port 5432.
regards, tom lane
On 21 October 2010 16:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
could be reduced to just:
else
su postgres -c 'postgres -D /var/lib/pgsql/data &'
exit 0
fi
I'm not sure about your comment that manual start attempts fail with
LOG: could not bind IPv4 socket: Address already in use
It's pretty hard to believe that that could occur on a freshly
booted system unless the TCP port was in fact already in use ---
ie, either there *is* a running postmaster, or something else is
using port 5432.
I concur on both accounts; I would like to see the output of the
actual script, though, when it refuses to start; and also a
netstat -anp | grep 5432
Cheers,
Andrej
On 10/20/2010 6:53 PM, Rich Shepard wrote:
For reasons I do not understand, the Slackware start-up file for postgres
(/etc/rc.d/rc.postgresql) fails to work properly after I reboot the system.
(Reboots normally occur only after a kernel upgrade or with a hardware
failure that crashes the system.)Trying to restart the system manually (su postgres -c 'postgres -D
/var/lib/pgsql/data &') regardless of the presence of /tmp/.s.PGSQL.5432
and /var/lib/pgsql/postmaster.pid. Here's what I see:[rshepard@salmo ~]$ su postgres -c 'postgres -D /var/lib/pgsql/data &'
Password: [rshepard@salmo ~]$ LOG: could not bind IPv4 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a
few seconds and retry.
WARNING: could not create listen socket for "localhost"
FATAL: could not create any TCP/IP socketsIf someone would be kind enough to point out what I'm doing incorrectly
(e.g., removing /tmp/.s.PGSQL.5432 and postmaster.pid when the startup
process complains they're not right) I'll save this information for the next
time. I can also provide the 'start' section of the Slackware init file so I
could learn why it's not working properly.TIA,
Rich
what does
$ netstat -an|grep 5432
return?
what does
$ ps -ef|grep post
return?
The above indicates that the tcp ipv4 socket is already bound by some process
On Wed, Oct 20, 2010 at 4:53 PM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
For reasons I do not understand, the Slackware start-up file for postgres
(/etc/rc.d/rc.postgresql) fails to work properly after I reboot the system.
(Reboots normally occur only after a kernel upgrade or with a hardware
failure that crashes the system.)Trying to restart the system manually (su postgres -c 'postgres -D
/var/lib/pgsql/data &') regardless of the presence of /tmp/.s.PGSQL.5432
and /var/lib/pgsql/postmaster.pid. Here's what I see:[rshepard@salmo ~]$ su postgres -c 'postgres -D /var/lib/pgsql/data &'
Password: [rshepard@salmo ~]$ LOG: could not bind IPv4 socket: Address
already in use
HINT: Is another postmaster already running on port 5432? If not, wait a
few seconds and retry.
WARNING: could not create listen socket for "localhost"
FATAL: could not create any TCP/IP sockets
Are you sure postgresql isn't getting started by some other init
script before this one runs? warnings that a port can't be bound to
is usually just that. something else is on it. What does lsof tell
you is running on that port?
On Wed, 20 Oct 2010, Tom Lane wrote:
Personally, I'd drop all the machinations with checking the pidfile or
removing old socket files.
Tom,
I didn't write the script; whoever maintains the Slackware package for
PostgreSQL did. Regardless, I'll make the changes you suggest.
In particular, I wonder whether the script's refusal to start if the
pidfile already exists accounts for your report that it fails to
auto-restart after a reboot.
This clears up my uncertainty. The pidfile should not exist after a clean
shutdown, so it should be removed after a crash, too.
could be reduced to just:
else
su postgres -c 'postgres -D /var/lib/pgsql/data &'
exit 0
fiI'd also strongly recommend making that be "su - postgres -c ..."
rather than the way it is now; it's failing to ensure that the
postmaster is started with the postgres account's login settings.
Done. I wondered about the 'su postgres' because when I run that on the
command line I'm asked for the postgres password. I suppose that since
root's running the init file it's not asked.
I'm not sure about your comment that manual start attempts fail with
LOG: could not bind IPv4 socket: Address already in use
It's pretty hard to believe that that could occur on a freshly
booted system unless the TCP port was in fact already in use ---
ie, either there *is* a running postmaster, or something else is
using port 5432.
I'm not seeing this now, but running the revised script (as root) still
produces this:
Starting PostgreSQL
3753
3755
3756
3757
3758
16481
PostgreSQL daemon already running
Warning: Missing pid file /var/lib/pgsql/data/postmaster.pid
Yet, when I try to access one of my databases I cannot:
[rshepard@salmo ~]$ psql aesi
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
There was no postgres running before I ran /etc/rc.d/rc.postgresql start.
There is also no socket on /tmp.
I'd greatly appreciate learning why the startup script is not working so I
can be confident that either the rc.postgresql file or my command line
invocation will consistenly work properly to start the server. I will
provide whatever system information is needed to help diagnose and fix this
problem.
Many thanks,
Rich
Many thanks,
Rich
On Thu, Oct 21, 2010 at 11:27 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
Yet, when I try to access one of my databases I cannot:
[rshepard@salmo ~]$ psql aesi
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
So, what do
telnet localhost 5432
AND
psql -h localhost -l
do?
On Thu, 21 Oct 2010, Scott Marlowe wrote:
So, what do
telnet localhost 5432
Scott,
That port's clear:
[rshepard@salmo ~]$ telnet localhost 5432
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
AND
psql -h localhost -l
Huh!
[rshepard@salmo ~]$ psql -h localhost -l
List of databases
Name | Owner | Encoding
-----------+------------+----------
aesi | sql-ledger | LATIN1
cms | rshepard | UTF8
postgres | postgres | UTF8
refdb | postgres | UTF8
scirefs | rshepard | LATIN1
template0 | postgres | UTF8
template1 | postgres | UTF8
(7 rows)
So, why can't I connect to a database by entering, for example, 'psql
aesi'?
Thanks,
Rich
--- On Thu, 10/21/10, Reid Thompson <reid.thompson@ateb.com> wrote:
From: Reid Thompson <reid.thompson@ateb.com>
Subject: Re: [GENERAL] Cannot Start Postgres After System Boot
To: "Rich Shepard" <rshepard@appl-ecosys.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, October 21, 2010, 4:28 AM
On 10/20/2010 6:53 PM, Rich Shepard wrote:
For reasons I do not understand, the Slackware start-up file for postgres
(/etc/rc.d/rc.postgresql) fails to work properly after I reboot the system.
(Reboots normally occur only after a kernel upgrade or with a hardware
failure that crashes the system.)Trying to restart the system manually (su postgres -c 'postgres -D
/var/lib/pgsql/data &') regardless of the presence of /tmp/.s.PGSQL.5432
and /var/lib/pgsql/postmaster.pid. Here's what I see:[rshepard@salmo ~]$ su postgres -c 'postgres -D /var/lib/pgsql/data &'
Password: [rshepard@salmo ~]$ LOG: could not bind IPv4 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a
few seconds and retry.
WARNING: could not create listen socket for "localhost"
FATAL: could not create any TCP/IP socketsIf someone would be kind enough to point out what I'm doing incorrectly
(e.g., removing /tmp/.s.PGSQL.5432 and postmaster.pid when the startup
process complains they're not right) I'll save this information for the next
time. I can also provide the 'start' section of the Slackware init file so I
could learn why it's not working properly.TIA,
Rich
what does
$ netstat -an|grep 5432
return?
what does
$ ps -ef|grep post
return?
The above indicates that the tcp ipv4 socket is already bound by some process
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Try to delete the files like this
.s.PGSQL.5432
.s.PGSQL.5432.lock
8.x-main.pid
and restart postmaster
On Thu, 21 Oct 2010, Reid Thompson wrote:
what does
$ netstat -an|grep 5432
return?
Reid,
[rshepard@salmo ~]$ netstat -an|grep 5432
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN
unix 3 [ ] STREAM CONNECTED 785432
what does
$ ps -ef|grep post
return?
The above indicates that the tcp ipv4 socket is already bound by some process
[rshepard@salmo ~]$ ps -ef|grep post
postgres 3753 1 0 Oct20 ? 00:00:00 postgres -D /var/lib/pgsql/data
postgres 3755 3753 0 Oct20 ? 00:00:00 postgres: writer process
postgres 3756 3753 0 Oct20 ? 00:00:00 postgres: wal writer process
postgres 3757 3753 0 Oct20 ? 00:00:00 postgres: autovacuum launcher process
postgres 3758 3753 0 Oct20 ? 00:00:00 postgres: stats collector process
root 4285 1 0 Oct19 ? 00:00:01 /usr/libexec/postfix/master
postfix 4287 4285 0 Oct19 ? 00:00:00 qmgr -l -t fifo -u
postfix 10143 4285 0 02:15 ? 00:00:00 anvil -l -t unix -u
postfix 16244 4285 0 10:01 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16245 4285 0 10:01 ? 00:00:00 trivial-rewrite -n rewrite -t unix -u
postfix 16246 4285 0 10:01 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16305 4285 0 10:06 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16426 4285 0 10:15 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16625 4285 0 10:31 ? 00:00:00 pickup -l -t fifo -u
postfix 16743 4285 0 10:38 ? 00:00:00 cleanup -z -t unix -u
postfix 16744 4285 0 10:38 ? 00:00:00 local -t unix
Yet I cannot connect to a database either from the command line or, in the
case of SQL-Ledger, from firefox:
Error!
could not connect to server: No such file or directory Is the server running
locally and accepting connections on Unix domain socket
"/tmp/.s.PGSQL.5432"?
Thanks,
Rich
On Thu, 21 Oct 2010, Lennin Caro wrote:
Try to delete the files like this
.s.PGSQL.5432
.s.PGSQL.5432.lock
8.x-main.pid
and restart postmaster
Lennin,
The sockets are not to be found.
Rich
On 10/21/2010 10:41 AM, Rich Shepard wrote:
On Thu, 21 Oct 2010, Reid Thompson wrote:
what does
$ netstat -an|grep 5432
return?Reid,
[rshepard@salmo ~]$ netstat -an|grep 5432
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN unix 3 [ ] STREAM CONNECTED 785432what does
$ ps -ef|grep post
return?
The above indicates that the tcp ipv4 socket is already bound by some
process[rshepard@salmo ~]$ ps -ef|grep post
postgres 3753 1 0 Oct20 ? 00:00:00 postgres -D /var/lib/pgsql/data
postgres 3755 3753 0 Oct20 ? 00:00:00 postgres: writer process postgres
3756 3753 0 Oct20 ? 00:00:00 postgres: wal writer process postgres 3757
3753 0 Oct20 ? 00:00:00 postgres: autovacuum launcher process postgres
3758 3753 0 Oct20 ? 00:00:00 postgres: stats collector process root 4285
1 0 Oct19 ? 00:00:01 /usr/libexec/postfix/master
postfix 4287 4285 0 Oct19 ? 00:00:00 qmgr -l -t fifo -u
postfix 10143 4285 0 02:15 ? 00:00:00 anvil -l -t unix -u
postfix 16244 4285 0 10:01 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16245 4285 0 10:01 ? 00:00:00 trivial-rewrite -n rewrite -t unix -u
postfix 16246 4285 0 10:01 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16305 4285 0 10:06 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16426 4285 0 10:15 ? 00:00:00 smtpd -n smtp -t inet -u -o stress
postfix 16625 4285 0 10:31 ? 00:00:00 pickup -l -t fifo -u
postfix 16743 4285 0 10:38 ? 00:00:00 cleanup -z -t unix -u
postfix 16744 4285 0 10:38 ? 00:00:00 local -t unixYet I cannot connect to a database either from the command line or, in the
case of SQL-Ledger, from firefox:Error!
could not connect to server: No such file or directory Is the server
running
locally and accepting connections on Unix domain socket
"/tmp/.s.PGSQL.5432"?Thanks,
Rich
What does your postgresql.conf file show for ? :
listen_addresses =
--
Adrian Klaver
adrian.klaver@gmail.com
On Thu, 21 Oct 2010, Adrian Klaver wrote:
What does your postgresql.conf file show for ? :
listen_addresses =
Adrian,
#listen_addresses = 'localhost' # what IP address(es) to listen on;
This hasn't changed.
Thanks,
Rich
Rich Shepard <rshepard@appl-ecosys.com> writes:
On Wed, 20 Oct 2010, Tom Lane wrote:
In particular, I wonder whether the script's refusal to start if the
pidfile already exists accounts for your report that it fails to
auto-restart after a reboot.
This clears up my uncertainty. The pidfile should not exist after a clean
shutdown, so it should be removed after a crash, too.
Actually, I was saying that the script should *not* concern itself with
the pidfile at all. Having a script that automatically removes the
pidfile is a big foot-gun: if you ever run it at any time other than
system boot, you'll destroy a critical interlock against starting two
postmasters in the same data directory. The postmaster is perfectly
capable of getting rid of a stale pidfile by itself, and is far less
likely to do the wrong thing than a scripted removal is.
Yet, when I try to access one of my databases I cannot:
[rshepard@salmo ~]$ psql aesi
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
There was no postgres running before I ran /etc/rc.d/rc.postgresql start.
There is also no socket on /tmp.
Hmm, maybe the postmaster thinks it should be putting the socket file
someplace other than /tmp. Have you got a nondefault setting of
unix_socket_directory in postgresq.conf? Also, if you're using the
distro's build of postgresql not your own, it's possible that the
compiled-in default for unix_socket_directory isn't /tmp --- though
the copy of libpq you're using seems to think it is /tmp. Maybe your
libpq came from someplace different than the postmaster executable?
regards, tom lane
On Thu, 2010-10-21 at 10:35 -0700, Rich Shepard wrote:
On Thu, 21 Oct 2010, Scott Marlowe wrote:
So, what do
telnet localhost 5432
Scott,
That port's clear:
[rshepard@salmo ~]$ telnet localhost 5432
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.AND
psql -h localhost -lHuh!
[rshepard@salmo ~]$ psql -h localhost -l
List of databases
Name | Owner | Encoding
-----------+------------+----------
aesi | sql-ledger | LATIN1
cms | rshepard | UTF8
postgres | postgres | UTF8
refdb | postgres | UTF8
scirefs | rshepard | LATIN1
template0 | postgres | UTF8
template1 | postgres | UTF8
(7 rows)So, why can't I connect to a database by entering, for example, 'psql
aesi'?Thanks,
Rich
what does
$ netstat -an |grep 5432
return?
something is running on tcp port 5432
On Thu, Oct 21, 2010 at 11:35 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Thu, 21 Oct 2010, Scott Marlowe wrote:
So, what do
telnet localhost 5432
Scott,
That port's clear:
[rshepard@salmo ~]$ telnet localhost 5432
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
So something IS attached and is answering the phone.
AND
psql -h localhost -lHuh!
[rshepard@salmo ~]$ psql -h localhost -l
List of databases
So a postgres IS running on your machine. I put it to you it's not
running where you think it is.
On Thu, Oct 21, 2010 at 11:36 AM, Lennin Caro <lennin.caro@yahoo.com> wrote:
Try to delete the files like this
.s.PGSQL.5432
.s.PGSQL.5432.lock
8.x-main.pidand restart postmaster
WHOA, never delete those files unless you're sure you've killed off
postgres first. Then and only then you can delete them and safely
restart. If you ever manage to bring up two postmasters on the same
store you've just destroyed your database.
--
To understand recursion, one must first understand recursion.
On Thu, 21 Oct 2010, Tom Lane wrote:
Actually, I was saying that the script should *not* concern itself with
the pidfile at all.
Tom,
I understood what you wrote.
Hmm, maybe the postmaster thinks it should be putting the socket file
someplace other than /tmp. Have you got a nondefault setting of
unix_socket_directory in postgresq.conf?
No. It's been commented out forever, so it should be the default.
Also, if you're using the distro's build of postgresql not your own, it's possible that the compiled-in default for unix_socket_directory isn't /tmp --- though the copy of libpq you're using seems to think it is /tmp.
The currently installed 8.3.3 has been running for some time now. I've not
made any changes since last Friday (the last day I used one of the
databases), and the system board failed Sunday afternoon, just after an OS
upgrade.
Maybe your libpq came from someplace different than the postmaster
executable?
I've no idea how that could have happened.
Since I cannot start the postmaster I cannot run pg_dumpall. What's the
pragmatic way for me to once again get postgres running (and, presumably,
able to cleanly stop and restart when necessary)?
Many thanks,
Rich