Missing domain socket after reboot.

Started by Bill Moseleyalmost 20 years ago3 messagesgeneral
Jump to latest
#1Bill Moseley
moseley@hank.org

After a reboot today Postgresql 8.1 came back up and started
accepting connections over TCP but the unix socket file was missing.

This is on Debian Stable, and I can't imagine what might of removed
the file.

Running psql I get:

$ psql test
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Yep, missing:

$ ls -la /var/run/postgresql
total 8
drwxrwsr-x 2 postgres postgres 4096 2006-06-21 17:03 .
drwxr-xr-x 16 root root 4096 2006-06-21 21:10 ..

Config looks ok:

/etc/postgresql/8.1/main$ fgrep unix_socket_dir postgresql.conf
unix_socket_directory = '/var/run/postgresql'

Startup option:

$ ps ux -u postgres | grep unix_socket
postgres 1512 0.0 0.3 17564 3476 ? S 17:02 0:00 /usr/lib/postgresql/8.1/bin/postmaster -D /var/lib/postgresql/8.1/main -c unix_socket_directory=/var/run/postgresql -c config_file=/etc/postgresql/8.1/main/postgresql.conf -c hba_file=/etc/postgresql/8.1/main/pg_hba.conf -c ident_file=/etc/postgresql/8.1/main/pg_ident.conf

Hum. lsof knows about the file.

$ lsof -p 1512 | grep /var/run
postmaste 1512 postgres 4u unix 0xf78b5980 1631 /var/run/postgresql/.s.PGSQL.5432

Any ideas what happened to the socket?

I had to stop and start the postmaster to get the socket back.

--
Bill Moseley
moseley@hank.org

#2Doug McNaught
doug@mcnaught.org
In reply to: Bill Moseley (#1)
Re: Missing domain socket after reboot.

Bill Moseley <moseley@hank.org> writes:

Hum. lsof knows about the file.

$ lsof -p 1512 | grep /var/run
postmaste 1512 postgres 4u unix 0xf78b5980 1631 /var/run/postgresql/.s.PGSQL.5432

Any ideas what happened to the socket?

Maybe something in your bootup process tried to clean up /var/run and
deleted it after the postmaster had started?

I had to stop and start the postmaster to get the socket back.

Be interesting to see if you can reproduce it...

-Doug

#3Bill Moseley
moseley@hank.org
In reply to: Doug McNaught (#2)
Re: Missing domain socket after reboot.

On Thu, Jun 22, 2006 at 08:16:05AM -0400, Douglas McNaught wrote:

Bill Moseley <moseley@hank.org> writes:

Hum. lsof knows about the file.

$ lsof -p 1512 | grep /var/run
postmaste 1512 postgres 4u unix 0xf78b5980 1631 /var/run/postgresql/.s.PGSQL.5432

Any ideas what happened to the socket?

Maybe something in your bootup process tried to clean up /var/run and
deleted it after the postmaster had started?

That's what I thought, but my quick look couldn't find anything in
the init scripts, not that that's conclusive:

$ fgrep /var/run * | grep rm
apache2: [ -f /var/run/apache2/ssl_scache ] && rm -f /var/run/apache2/*ssl_scache*
bootclean.sh: rm -f /var/run/.clean
bootmisc.sh:rm -f /tmp/.clean /var/run/.clean /var/lock/.clean
portmap: rm -f /var/run/portmap.upgrade-state
portmap: rm -f /var/run/portmap.state
rsync: rm -f /var/run/rsync.pid
rsync: rm -f /var/run/rsync.pid
rsync: rm -f /var/run/rsync.pid
umountnfs.sh:rm -f /tmp/.clean /var/lock/.clean /var/run/.clean

But maybe postgresql is started too early.

$ ls /etc/rc?.d | grep postgres | head -1
K20postgresql-8.1
K20postgresql-8.1
S20postgresql-8.1
S20postgresql-8.1
S20postgresql-8.1
S20postgresql-8.1
K20postgresql-8.1

Apache, for example, starts S91.

/etc/rc2.d:
K10atd S20courier-imap S20mysqld-helper S21nfs-common
K10cron S20courier-imap-ssl S20netatalk S21quotarpc
K10syslog-ng S20courier-mta S20nfs-kernel-server S23ntp-server
S10sysklogd S20courier-pop S20ntop S25mdadm
S11klogd S20courier-pop-ssl S20oidentd S30sysctl
S14ppp S20darwinss S20postfix S89cron
S15logical S20exim4 S20postgresql-8.1 S91apache2
S16mountnfsforlogical.sh S20grlogcheck S20rmnologin S91ifp_httpd
S18atd S20httpd S20rsync S99jabber
S18portmap S20httpd2 S20saslauthd S99stop-bootlogd
S19spamassassin S20inetd S20ssh S99ud
S19syslog-ng S20jabber S20syslog-ng
S20binfmt-support S20makedev S20sysstat
S20courier-authdaemon S20mysqld S20xmail

Be interesting to see if you can reproduce it...

Next reboot I'll look again. It's a a production machine so I can't
really bring it up one service at a time.

--
Bill Moseley
moseley@hank.org