Bug #882: Cannot manually log in to database.

Started by PostgreSQL Bugs Listabout 23 years ago9 messagesbugs
Jump to latest
#1PostgreSQL Bugs List
pgsql-bugs@postgresql.org

Ben Kinsey (benk@aiinet.com; sk9887@sbc.com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
Cannot manually log in to database.

Long Description
We are receiving the following error when trying to manually log in to the the database:

okapview# /opt/pgsql-7.1.3/bin/psql -U postgres -d AppliedView
psql: connectDBStart() -- connect() failed: No such file or directory
Is the postmaster running locally
and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?

We searched your documentation and all that it said was to verify that the postmaster daemon is running, and it already is running on the system. We have a daemon process that is connected to the database and it is not refused this connection. Only psql command line log ins are refused.

Stopping and starting the postmaster daemon clears up this problem, but this problem creeps up about 2 times a week, and is a major annoyance.

Sample Code

No file was uploaded with this report

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #882: Cannot manually log in to database.

pgsql-bugs@postgresql.org writes:

okapview# /opt/pgsql-7.1.3/bin/psql -U postgres -d AppliedView
psql: connectDBStart() -- connect() failed: No such file or directory
Is the postmaster running locally
and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?

Stopping and starting the postmaster daemon clears up this problem, but this problem creeps up about 2 times a week, and is a major annoyance.

Sounds to me like you've got a cron script that removes everything in
tmp about twice a week. I suggest teaching it not to remove socket
files. On most Unixen the mod date on a socket file isn't changed by
normal activity, so a tmp-cleaner that only pays attention to the mod
date will mistakenly decide a socket is fair game for removal.

regards, tom lane

#3Giles Lean
giles@nemeton.com.au
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #882: Cannot manually log in to database.

[ Where *did* that Reply-To: line come from -- it's broken ...

repl: bad addresses:
benk@aiinet.com; -- extraneous semi-colon
]

Stopping and starting the postmaster daemon clears up this problem,
but this problem creeps up about 2 times a week, and is a major
annoyance.

Either teach your /tmp cleaner not to clean out the socket files as
Tom Lane suggested, or arrange to update the socket timestamps. I
think it's easier to just keep updating the timestamps -- then I don't
have to educate each new system administrator.

utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

If you can't write that into C, drop me a line, and I'll send you the
code. Most touch(1) implementations would also do the right thing, so
you could try that too. Then put whatever solution you choose into
cron, and you're done.

Regards,

Giles

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Giles Lean (#3)
Re: Bug #882: Cannot manually log in to database.

Giles Lean <giles@nemeton.com.au> writes:

Either teach your /tmp cleaner not to clean out the socket files as
Tom Lane suggested, or arrange to update the socket timestamps. I
think it's easier to just keep updating the timestamps -- then I don't
have to educate each new system administrator.

utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

Hm, do you think that's portable?

There is already code in the postmaster to touch the socket lock file
every few minutes, so as to keep tmp-cleaners from zapping it. (Or at
least there once was; I can't find it right now.) If we could do the
same for the socket file it'd be really nice. But I didn't think there
was any portable way to update the mod timestamp on a socket.

regards, tom lane

#5Kinsey, Ben
BenK@aiinet.com
In reply to: Tom Lane (#4)
Re: Bug #882: Cannot manually log in to database.

Here's a little more detail as to how this socket file was getting deleted:

On the system I'm using, if you attempt to start postmaster when an instance
of it is already running, the socket file gets deleted. It was discovered
that upon bootup of the system, the postgres startup script was being
executed twice in the /sbin/rc3.d directory, and this was causing the socket
file to get deleted. It wasn't a cron job.

Ben Kinsey

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, January 24, 2003 11:04 AM
To: Giles Lean
Cc: sk9887@sbc.com; benk@aiinet.com; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Bug #882: Cannot manually log in to database.

Giles Lean <giles@nemeton.com.au> writes:

Either teach your /tmp cleaner not to clean out the socket files as
Tom Lane suggested, or arrange to update the socket timestamps. I
think it's easier to just keep updating the timestamps -- then I don't
have to educate each new system administrator.

utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

Hm, do you think that's portable?

There is already code in the postmaster to touch the socket lock file every
few minutes, so as to keep tmp-cleaners from zapping it. (Or at least there
once was; I can't find it right now.) If we could do the same for the
socket file it'd be really nice. But I didn't think there was any portable
way to update the mod timestamp on a socket.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kinsey, Ben (#5)
Re: Bug #882: Cannot manually log in to database.

"Kinsey, Ben" <BenK@aiinet.com> writes:

Here's a little more detail as to how this socket file was getting deleted:
On the system I'm using, if you attempt to start postmaster when an instance
of it is already running, the socket file gets deleted. It was discovered
that upon bootup of the system, the postgres startup script was being
executed twice in the /sbin/rc3.d directory, and this was causing the socket
file to get deleted. It wasn't a cron job.

The second postmaster launch was doing that? It should not, because it
should detect that there's already a postmaster before it starts messing
with the socket file. Perhaps there's a gratuitous "rm" of the socket
file or the socket lockfile in the startup script?

regards, tom lane

#7Giles Lean
giles@nemeton.com.au
In reply to: Tom Lane (#4)
Re: Bug #882: Cannot manually log in to database.

Tom Lane <tgl@sss.pgh.pa.us> writes:

Giles Lean <giles@nemeton.com.au> writes:

utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

Hm, do you think that's portable?

Hm ... yes, actually I do. I use it on HP-UX, and testing indicates
that it works on FreeBSD, Linux, NetBSD and Tru64 as well.

Thinking about it, a Unix domain socket has an entry in the filesystem
and thus an inode. utimes() operates on the inode so it makes sense to
me that this should Just Work.

While UNIX98 (aka the "Single Unix Standard, version 2") talks about a
"file" argument to utimes() it doesn't make any particular mention
about restrictions on what type of file, and the function needs to
work on some non-regular files such as device files to be useful.

There is already code in the postmaster to touch the socket lock file
every few minutes, so as to keep tmp-cleaners from zapping it. (Or at
least there once was; I can't find it right now.) If we could do the
same for the socket file it'd be really nice. But I didn't think there
was any portable way to update the mod timestamp on a socket.

I've done some testing today, and the test passed on everything I
tested it on:

FreeBSD 4.7-RELEASE alpha
HP-UX B.11.11 9000/800
HP-UX B.11.22 ia64
Linux 2.4.18-14 i686 # RedHat Linux 8.0
Linux 2.4.18-mckinley-smp ia64 # Debian GNU/Linux 3.0
NetBSD 1.6_STABLE i386
OSF1 V4.0 alpha # Tru64
OSF1 V5.1 alpha # Tru64

It's too hot here today to go outside but even so, that's enough
testing ...

I've attached the code I used. It was considered to work if utimes()
didn't return an error and if the st_mtime value returned by stat()
changed:

$ make socket_utimes
cc -O2 -o socket_utimes socket_utimes.c
$ ./socket_utimes socket
utimes() successfully changed a Unix domain socket mtime.
$ uname -srm
NetBSD 1.6_STABLE i386

If utimes() works on the other supported platforms that have Unix
domain sockets perhaps we can put the /tmp cleaners to rest for good.

Anyone willing to test AIX, IRIX, MacOS X, Solaris, or SCO Unix? I
don't expect the Windows ports with or without cygwin will support
Unix domain sockets, so they probably don't need testing. :-)

Regards,

Giles

P.S. http://www.testdrive.hp.com is great for quick portability
testing. It was a Compaq program that HP has expanded since their
merger. Highly recommended.

Attachments:

socket_utimes.ctext/plain; charset=us-ascii; name=socket_utimes.cDownload
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Giles Lean (#7)
Re: Bug #882: Cannot manually log in to database.

Giles Lean <giles@nemeton.com.au> writes:

utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

Hm, do you think that's portable?

Hm ... yes, actually I do. I use it on HP-UX, and testing indicates
that it works on FreeBSD, Linux, NetBSD and Tru64 as well.

Thinking about it, a Unix domain socket has an entry in the filesystem
and thus an inode. utimes() operates on the inode so it makes sense to
me that this should Just Work.

Sure, the question was more about whether the system call exists
everywhere.

I've done some testing today, and the test passed on everything I
tested it on:

I can add HPUX 10.20, Mac OS X 10.2.3, and a pretty ancient Linux
(kernel 2.0.36, not sure of the exact distro) to the list of stuff
your test program seems to pass on.

If utimes() works on the other supported platforms that have Unix
domain sockets perhaps we can put the /tmp cleaners to rest for good.

My feeling is we may as well put it in. If it turns out we have
platforms without utimes(), we can put in a configure test and #ifdef
it. If the call doesn't exist or doesn't update the mod time as
expected, we're no worse off than before --- and for platforms where
it does work, this is a big win.

Thanks for looking into it! I'll work on applying the fix.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Giles Lean (#7)
Re: Bug #882: Cannot manually log in to database.

Giles Lean <giles@nemeton.com.au> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

Hm, do you think that's portable?

Hm ... yes, actually I do. I use it on HP-UX, and testing indicates
that it works on FreeBSD, Linux, NetBSD and Tru64 as well.

Some digging about on the net revealed that there is a very similar
function utime() that is POSIX-standard, whereas utimes() is not.

Accordingly, I bit the bullet and put in a configure test to see which
one(s) we have. With any luck, this will hold up through 7.4's port
testing.

regards, tom lane