Warning: Don't delete those /tmp/.PGSQL.* files

Started by Joel Burtonabout 25 years ago17 messages

jburton@scw.org

about 25 years ago

This is part question, part short, sad tale.

Working on my database, I had a view that would lock up the
machine (eats all available memory, soon goes belly-up.) Turned out
to be a recursive view: view A asked a question of view B that
asked view A. [is it possible for pgsql to detect this? I worry about
my users doing this.] [and, yes, I should use kernel-level controls to
make sure that the postmaster process can't use all available
resources; but hey, it's a development machine. ]

Anyway, as I was tracking down this problem, I couldn't restart
PostgreSQL if the machine had crashed and I had a /tmp/.PGSQL.*
file in the temp directory; it assumed that the socket was in use.
So, I began restarting pgsql w/a line like

rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &

Which works great. Except that I *kept* using this for two weeks
after the view problem (damn that bash up-arrow laziness!), and
yesterday, used it to restart PostgreSQL except (oops!) it was
already running.

Results: no database at all. All classes (tables/views/etc) returned
0 records (meaning that no tables showed up in psql's \d, since
pg_class returned nothing.)

I don't know enough about why -- the /tmp files appear to have a
length of 0, but pgsql seems to care a great deal about them.

[ I did have a very fresh pg_dumpall file--thank you, anacron--so I
lost about 30 minutes worth of work, but it would have been
everything if I never backed up. ]

My advice:

1) Use pg_dumpall.
2) Don't delete those /tmp files until you're *sure* you're out of Pg

Anyone know what *happened* and *why*? Was there anything I
could have done?

Thanks!

[ I do read these lists, but always appreciate a cc on responses so I
don't accidentally miss them. TIA. ]
--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Joel Burton (#1)

Re: Warning: Don't delete those /tmp/.PGSQL.* files

"Joel Burton" <jburton@scw.org> writes:

Working on my database, I had a view that would lock up the
machine (eats all available memory, soon goes belly-up.) Turned out
to be a recursive view: view A asked a question of view B that
asked view A. [is it possible for pgsql to detect this?

It should have been detected --- there is a check in the rewriter that's
supposed to error out after ten recursive rewrite calls. Maybe that
logic is broken, or misses certain cases. Could you exhibit the views
that caused this behavior for you?

So, I began restarting pgsql w/a line like

rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &

Which works great. Except that I *kept* using this for two weeks
after the view problem (damn that bash up-arrow laziness!), and
yesterday, used it to restart PostgreSQL except (oops!) it was
already running.

Results: no database at all. All classes (tables/views/etc) returned
0 records (meaning that no tables showed up in psql's \d, since
pg_class returned nothing.)

Ugh. The reason that removing the socket file allowed a second
postmaster to start up is that we use an advisory lock on the socket
file as the interlock that prevents two PMs on the same port number.
Remove the socket file, poof no interlock.

*However*, there is a second line of defense to prevent two postmasters
in the same directory, and I don't understand why that didn't trigger.
Unless you are running a version old enough to not have it. What PG
version is this, anyway?

Assuming you got past both interlocks, the second postmaster would have
reinitialized Postgres' shared memory block for that database, which
would have been a Bad Thing(tm) ... but it would not have led to any
immediate damage to your on-disk files, AFAICS. Was the database still
hosed after you stopped both postmasters and started a fresh one? (Did
you even try that?)

This story does indicate that we need a less fragile interlock against
starting two postmasters on one database. I have to admit that it
hadn't occurred to me that you could break the port-number interlock
so easily as that :-(. But obviously you can, so we need a different
way of representing the interlock. Hackers, any thoughts?

Note: I've narrowed followups to just pghackers, since that seems like
the right forum for discussing a better interlock mechanism.

regards, tom lane

Larry Rosenman

ler@lerctr.org

about 25 years ago

In reply to: Tom Lane (#2)

Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

* Tom Lane <tgl@sss.pgh.pa.us> [001125 16:37]:

"Joel Burton" <jburton@scw.org> writes:

This story does indicate that we need a less fragile interlock against
starting two postmasters on one database. I have to admit that it
hadn't occurred to me that you could break the port-number interlock
so easily as that :-(. But obviously you can, so we need a different
way of representing the interlock. Hackers, any thoughts?

how about a .pid/.port/.??? file in the /data directory, and a lock on that?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

grasshacker@over-yonder.net

about 25 years ago

In reply to: Tom Lane (#2)

Re: Warning: Don't delete those /tmp/.PGSQL.* files

On Sat, Nov 25, 2000 at 05:35:13PM -0500, some SMTP stream spewed forth:
*snip*

So, I began restarting pgsql w/a line like

rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &

Which works great. Except that I *kept* using this for two weeks
after the view problem (damn that bash up-arrow laziness!), and
yesterday, used it to restart PostgreSQL except (oops!) it was
already running.

Results: no database at all. All classes (tables/views/etc) returned
0 records (meaning that no tables showed up in psql's \d, since
pg_class returned nothing.)

*snip Tom's reply*
I have a situation vaguely related to this.
At some point Postgres was not shut down properly and now everytime at
startup I the error log gets something like:

---------
root% tail -f errlog
Waiting for postmaster starting up...DEBUG: Data Base System is starting
up at Sat Nov 25 16:53:10 2000
DEBUG: Data Base System was interrupted being in production at Sat Nov
25 16:35:27 2000
DEBUG: Data Base System is in production state at Sat Nov 25 16:53:10
2000
FATAL 1: ReleaseLruFile: No open files available to be closed
............................................................pg_ctl:
postmaster does not start up
---------

After that, all postgres processes die and the cycle begins again on
subsequent attempts to start postgres.
At one point I would receive some "Too many open files" (or similar)
error with postgres holding more than 750 file descriptors -- almost
entirely consisting of socket streams.
What is the significance of "ReleaseLruFile" and how can I repair this?

This is using FreeBSD 4.1-RELEASE and Postgres 7.0.2.

Thanks

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Larry Rosenman (#3)

Re: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

Larry Rosenman <ler@lerctr.org> writes:

* Tom Lane <tgl@sss.pgh.pa.us> [001125 16:37]:

This story does indicate that we need a less fragile interlock against
starting two postmasters on one database. I have to admit that it
hadn't occurred to me that you could break the port-number interlock
so easily as that :-(. But obviously you can, so we need a different
way of representing the interlock. Hackers, any thoughts?

how about a .pid/.port/.??? file in the /data directory, and a lock on that?

Nope, 'cause it wouldn't protect you against two postmasters in
different data directories trying to use the same port number.
The port-number lock has to use a system-wide mechanism.

You may want to go back and review the previous threads that have
discussed interlock issues. We have really three independent resources
that we have to ensure only one postmaster is using at a time:

1. Port number (for Unix socket, IP address, etc)

2. Data directory (database files)

3. Shared memory.

Up to now shared memory has been protected more or less implicitly
by the port-number lock, since the shared memory IPC key is derived
from the port number. However, the "virtual host" patch that we
recently accepted (way prematurely IMHO) breaks that correspondence.
I suspect that we really ought to try to have an independent interlock
on the shared memory block itself. There was a thread around 4/30/00
concerning changing the way that shmem IPC keys are generated, and
maybe that would address this concern.

If we weren't relying on port number to protect shared memory, I think
the existing interlocks on port would be sufficient. The kernel
enforces an interlock on listening to the same IP address, so that's
OK, and an advisory lock on the socket file is OK for preventing two
postmasters from listening to the same socket file. (There's no real
reason to prevent postmasters from using similarly-socket-numbered
socket files in different directories, other than the shmem key issue,
so a lock on the socket file is really just what we want for that
specific resource.)

There is a related issue on my todo list, though --- didn't we find out
awhile back that some older Linux kernels crash and burn if one attempts
to get an advisory lock on a socket file? (See thread 7/6/00) Were we
going to fix that, and if so how? Or will we just tell people that they
have to update their kernel to run Postgres? The current configure
script "works around" this by disabling the advisory lock on *all*
versions of Linux, which I regard as a completely unacceptable
solution...

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: GH (#4)

Re: Warning: Don't delete those /tmp/.PGSQL.* files

GH <grasshacker@over-yonder.net> writes:

FATAL 1: ReleaseLruFile: No open files available to be closed
............................................................pg_ctl:
postmaster does not start up

After that, all postgres processes die and the cycle begins again on
subsequent attempts to start postgres.
At one point I would receive some "Too many open files" (or similar)
error with postgres holding more than 750 file descriptors -- almost
entirely consisting of socket streams.
What is the significance of "ReleaseLruFile" and how can I repair this?

This is using FreeBSD 4.1-RELEASE and Postgres 7.0.2.

7.0.3 will probably help --- the message is coming out of some
inappropriate error recovery code that we fixed in 7.0.3.

The underlying problem, however, is that you are running out of kernel
file table slots (ENFILE or EMFILE error return from open()). Not
enough info here to tell why that's happening.

regards, tom lane

Peter Eisentraut

peter_e@gmx.net

about 25 years ago

In reply to: Tom Lane (#5)

Re: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

Tom Lane writes:

There is a related issue on my todo list, though --- didn't we find out
awhile back that some older Linux kernels crash and burn if one attempts
to get an advisory lock on a socket file? (See thread 7/6/00) Were we
going to fix that, and if so how? Or will we just tell people that they
have to update their kernel to run Postgres? The current configure
script "works around" this by disabling the advisory lock on *all*
versions of Linux, which I regard as a completely unacceptable
solution...

Firstly, AFAIK there's no official production kernel that fixes this.
When and if it gets fixed we can change that logic.

I have simple test program that exhibits the problem (taken from the
kernel mailing list), but

a) You shouldn't run test programs in configure.

b) You really shouldn't run test programs in configure that set up
networking connections.

c) You definitely shouldn't run test programs in configure that provoke
kernel exceptions.

We could use flock() on Linux, though.

Maybe we could name the socket file .s.PGSQL.port.pid and make
.s.PGSQL.port a symlink. Then you can find out whether the postmaster
that created the file is still running. (You could even put the actual
socket file into the data directory, although that would require
re-thinking the file permissions on the latter.)

Actually, this turns out to be similar to what you wrote in
http://www.postgresql.org/mhonarc/pgsql-hackers/1998-08/msg00835.html

But we really should be fixing the IPC interlock with IPC_EXCL, but the
code changes look to be non-trivial.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Peter Eisentraut (#7)

Re: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

Peter Eisentraut <peter_e@gmx.net> writes:

Maybe we could name the socket file .s.PGSQL.port.pid and make
.s.PGSQL.port a symlink. Then you can find out whether the postmaster
that created the file is still running.

Or just create a lockfile /tmp/.s.PGSQL.port#.lock, ie, same name as
socket file with ".lock" added (containing postmaster's PID). Then we
could share code with the data-directory-lockfile case.

Actually, this turns out to be similar to what you wrote in
http://www.postgresql.org/mhonarc/pgsql-hackers/1998-08/msg00835.html

Well, we've talked before about moving the socket files to someplace
safer than /tmp. The problem is to find another place that's not
platform-dependent --- else you've got a major configuration headache.

But we really should be fixing the IPC interlock with IPC_EXCL, but the
code changes look to be non-trivial.

AFAIR the previous thread, it wasn't that bad, it was just a matter of
someone taking the time to do it. Maybe I'll have a go at it...

regards, tom lane

grasshacker@over-yonder.net

about 25 years ago

In reply to: Tom Lane (#6)

Re: Warning: Don't delete those /tmp/.PGSQL.* files

On Sat, Nov 25, 2000 at 06:40:12PM -0500, some SMTP stream spewed forth:

GH <grasshacker@over-yonder.net> writes:

FATAL 1: ReleaseLruFile: No open files available to be closed
............................................................pg_ctl:
postmaster does not start up

After that, all postgres processes die and the cycle begins again on
subsequent attempts to start postgres.
At one point I would receive some "Too many open files" (or similar)
error with postgres holding more than 750 file descriptors -- almost
entirely consisting of socket streams.
What is the significance of "ReleaseLruFile" and how can I repair this?

This is using FreeBSD 4.1-RELEASE and Postgres 7.0.2.

7.0.3 will probably help --- the message is coming out of some
inappropriate error recovery code that we fixed in 7.0.3.

The underlying problem, however, is that you are running out of kernel
file table slots (ENFILE or EMFILE error return from open()). Not
enough info here to tell why that's happening.

Well, through some research of my own I have discovered that the file
issue is somehow related to our startup script:
/usr/local/etc/rc.d/pgsql.sh.
I am not sure how familiar you are with FreeBSD's startup process, but
it will suffice to say that this script expects one of three arguments:
start, stop, or status -- apparently corresponding to the options of
pg_ctl.

When I start the postgres server manually, it runs relatively fine.
i.e.
# su -l pgsql /usr/local/pgsql/bin/pg_ctl -w start > /usr/local/pgsql/errlog 2>&1 &

Here is pgsql.sh:

#!/bin/sh

# $FreeBSD: ports/databases/postgresql7/files/pgsql.sh.tmpl,v 1.8
2000/05/25 09:35:25 andreas Exp $
#
# For postmaster startup options, edit $PGDATA/postmaster.opts.default
# Preinstalled options are -i -o "-F"

case $1 in
start)
[ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib
# Clean up by Matt
# This is a really bad idea, unless we are absolutely certain that there
# are no postgres processes running or that we feel like restoring
# from a recent backup. ;-) gh
rm -f /tmp/.s.PGSQL*
[ -x /usr/local/pgsql/bin/pg_ctl ] && {
su -l pgsql \
/usr/local/pgsql/bin/pg_ctl -w start >
/usr/local/pgsql/errlog 2>&1 &
# /usr/local/pgsql/bin/pg_ctl -w start -o "-B 64 -N 32" start >
/usr/local/pgsql/errlog 2>&1 &
echo -n ' pgsql'
}
;;

stop)
[ -x /usr/local/pgsql/bin/pg_ctl ] && {
su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl -w -m fast stop'
}
;;

status)
[ -x /usr/local/pgsql/bin/pg_ctl ] && {
su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status'
}
;;

*)
echo "usage: `basename $0` {start|stop|status}" >&2
exit 64
;;
esac

EOF

running this script with "start" causes the postgres server to start,
run out of files, and then shutdown. Postgres is useable until it runs
out of files and shuts down.

Thanks.

Show quoted text

regards, tom lane

#10

Marko Kreen

marko@l-t.ee

about 25 years ago

In reply to: Tom Lane (#8)

Re: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

On Sat, Nov 25, 2000 at 07:41:52PM -0500, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, this turns out to be similar to what you wrote in
http://www.postgresql.org/mhonarc/pgsql-hackers/1998-08/msg00835.html

Well, we've talked before about moving the socket files to someplace
safer than /tmp. The problem is to find another place that's not
platform-dependent --- else you've got a major configuration headache.

Could this be described in e.g. /etc/postgresql/pg_client.conf?
a la the dbname idea?

I cant remember the exact terminology, but there is a
configuration file for clients, set at compile time where are
set the connection params for clients.

---------

[db_foo]
type=inet
host=srv3.devel.net
port=1234
# there should be a way of specifing dbname later too
database=asdf

[db_baz]
type=unix
socket=/var/lib/postgres/comm/db_baz

--------

Also there should be possible to give another configuration file
with env vars or command-line parameters.

Well, just a idea.

--
marko

#11

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Marko Kreen (#10)

Re: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

Marko Kreen <marko@l-t.ee> writes:

Well, we've talked before about moving the socket files to someplace
safer than /tmp. The problem is to find another place that's not
platform-dependent --- else you've got a major configuration headache.

Could this be described in e.g. /etc/postgresql/pg_client.conf?

The major objection to that is that if we rely on such a config file,
then you *cannot* install postgres without root permission (to make
the config file). Currently it's possible to fire up a test postmaster
without any special privileges whatever, and that's a nice feature.

A related objection is that such a file will itself become a source of
contention among multiple postmasters. Suppose I'm setting up a test
installation of a new version, while still running the prior release
as my main database. OK, I fire up the test postmaster on a different
port, and now I want to launch some of my usual clients for testing.
Oops, they connect to the old postmaster because that's what it says
to do in /etc/postgresql/pg_client.conf. I can't get them to connect
to the new postmaster unless I change /etc/postgresql/pg_client.conf,
which I *don't* want to do at this stage --- it'll break non-test
instances of these same clients.

I see some value in the pg_client.conf idea as a *per user* address
book, to shortcut full specification of all the databases that user
might want to connect to. As a system-wide configuration file, I think
it's a terrible idea.

regards, tom lane

#12

Marko Kreen

marko@l-t.ee

about 25 years ago

In reply to: Tom Lane (#11)

Re: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

On Mon, Nov 27, 2000 at 11:05:40AM -0500, Tom Lane wrote:

Marko Kreen <marko@l-t.ee> writes:

Well, we've talked before about moving the socket files to someplace
safer than /tmp. The problem is to find another place that's not
platform-dependent --- else you've got a major configuration headache.

Could this be described in e.g. /etc/postgresql/pg_client.conf?

The major objection to that is that if we rely on such a config file,
then you *cannot* install postgres without root permission (to make
the config file). Currently it's possible to fire up a test postmaster
without any special privileges whatever, and that's a nice feature.

I do not see this much of a problem tho'.

[ I use the words XCONFIG and XNAME because I have no good idea
what they should be called. ]

server startup precedence:

1) postmaster --xconfig ./foo.cfg
2) PG_XCONFIG=./foo.cfg
3) /etc/postgresql/pg_xconfig (compile time spec)

there is also a thing 'xname' which is the section of config
file to use:

1) --xname foodb
2) PG_XNAME=foodb
3) default_xname specified in config.

so, client (libpq (psql)) startup:

1) psql --xconfig ./xxx
2) PG_XCONFIG=./xxx
3) ~/.pg_xconfig
4) /etc/postgresql/pg_xconfig

and xname as in server.

It may be better if server config is in separate file because we
may want to give more options to server (ipc keys, data dirs,
etc). But I guess its sipler when they read the same file and
client simply ignores server directives. And server ignores
remote servers.

Also it should be possible to put all directives into commend
line too, for both client and server.

A related objection is that such a file will itself become a source of
contention among multiple postmasters. Suppose I'm setting up a test
installation of a new version, while still running the prior release
as my main database. OK, I fire up the test postmaster on a different
port, and now I want to launch some of my usual clients for testing.
Oops, they connect to the old postmaster because that's what it says
to do in /etc/postgresql/pg_client.conf. I can't get them to connect
to the new postmaster unless I change /etc/postgresql/pg_client.conf,
which I *don't* want to do at this stage --- it'll break non-test
instances of these same clients.

postmaster --xconfig ./test.cfg --xname testdb &
psql --xconfig ./test.cfg --xname testdb

I see some value in the pg_client.conf idea as a *per user* address
book, to shortcut full specification of all the databases that user
might want to connect to. As a system-wide configuration file, I think
it's a terrible idea.

So what you think of the above idea?

--
marko

#13

Joel Burton

jburton@scw.org

about 25 years ago

In reply to: Tom Lane (#2)

Re: Warning: Don't delete those /tmp/.PGSQL.* files

On 25 Nov 2000, at 17:35, Tom Lane wrote:

So, I began restarting pgsql w/a line like

rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &

Which works great. Except that I *kept* using this for two weeks
after the view problem (damn that bash up-arrow laziness!), and
yesterday, used it to restart PostgreSQL except (oops!) it was
already running.

Results: no database at all. All classes (tables/views/etc) returned
0 records (meaning that no tables showed up in psql's \d, since
pg_class returned nothing.)

Ugh. The reason that removing the socket file allowed a second
postmaster to start up is that we use an advisory lock on the socket
file as the interlock that prevents two PMs on the same port number.
Remove the socket file, poof no interlock.

*However*, there is a second line of defense to prevent two
postmasters in the same directory, and I don't understand why that
didn't trigger. Unless you are running a version old enough to not
have it. What PG version is this, anyway?

7.1devel, from about 1 week ago.

Assuming you got past both interlocks, the second postmaster would
have reinitialized Postgres' shared memory block for that database,
which would have been a Bad Thing(tm) ... but it would not have led to
any immediate damage to your on-disk files, AFAICS. Was the database
still hosed after you stopped both postmasters and started a fresh
one? (Did you even try that?)

Yes, I stopped both, rebooted machine, restarted postmaster.
Rebooted machine, used just postgres, tried to vacuum, tried to
dump, etc. Always the same story.

--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

#14

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Joel Burton (#13)

Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

"Joel Burton" <jburton@scw.org> writes:

On 25 Nov 2000, at 17:35, Tom Lane wrote:

Ugh. The reason that removing the socket file allowed a second
postmaster to start up is that we use an advisory lock on the socket
file as the interlock that prevents two PMs on the same port number.
Remove the socket file, poof no interlock.

*However*, there is a second line of defense to prevent two
postmasters in the same directory, and I don't understand why that
didn't trigger. Unless you are running a version old enough to not
have it. What PG version is this, anyway?

7.1devel, from about 1 week ago.

Ah, I see why the data-directory interlock file wasn't helping: it
wasn't checked until *after* shared memory was set up (read clobbered
:-(). This was not a very bright choice. I'm still surprised that
the shared-memory reset should've trashed your database so thoroughly,
though.

Over the past two days I've committed changes that should make the data
directory, socket file, and shared memory interlocks considerably more
robust. In particular, mechanically doing "rm -f /tmp/.s.PGSQL.5432"
should never be necessary anymore.

Sorry about your trouble...

BTW, your original message mentioned something about a recursive view
definition that wasn't being recognized as such. Could you provide
details on that?

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: GH (#9)

Re: Warning: Don't delete those /tmp/.PGSQL.* files

GH <grasshacker@over-yonder.net> writes:

running this script with "start" causes the postgres server to start,
run out of files, and then shutdown. Postgres is useable until it runs
out of files and shuts down.

Continuing on that line of thought --- it seems like this must be an
indication of a file-descriptor leak somewhere. That is, some bit of
code forgets to close a file it opened. Cycle through that bit of code
enough times, and the kernel stops being willing to give you more file
descriptors.

If this is correct, we could probably identify the leak by knowing what
file is being opened multiple times. Can you run 'lsof' or some similar
tool to check for duplicate descriptors being held open by the
postmaster?

I recall that we have fixed one or two leaks of this kind in the past,
but I don't recall details, nor which versions the fixes first appeared
in.

regards, tom lane

#16

Joel Burton

jburton@scw.org

about 25 years ago

In reply to: Tom Lane (#14)

Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

Ah, I see why the data-directory interlock file wasn't helping: it
wasn't checked until *after* shared memory was set up (read clobbered
:-(). This was not a very bright choice. I'm still surprised that
the shared-memory reset should've trashed your database so thoroughly,
though.

Over the past two days I've committed changes that should make the
data directory, socket file, and shared memory interlocks considerably
more robust. In particular, mechanically doing "rm -f
/tmp/.s.PGSQL.5432" should never be necessary anymore.

That's fantastic. Thanks for the quick fix.

BTW, your original message mentioned something about a recursive view
definition that wasn't being recognized as such. Could you provide
details on that?

I can't. It's a few weeks ago, the database has been in furious
development, and, of course, I didn't bother to save all those views
that crashed my server. I keep trying to re-create it, but can't
figure it out. I'm sorry.

I think it wasn't just two views pointing at each other (it would, of
course, be next to impossible to even create those, unless you hand
tweaked the system tables), but I think was a view-relies-on-a-
function-relies-on-a-view kind of problem. If I ever see it again, I'll
save it.

Thanks!

--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

#17

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Joel Burton (#16)

Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

"Joel Burton" <jburton@scw.org> writes:

I think it wasn't just two views pointing at each other (it would, of
course, be next to impossible to even create those, unless you hand
tweaked the system tables), but I think was a view-relies-on-a-
function-relies-on-a-view kind of problem.

Oh, OK. I wouldn't expect the rewriter to realize that that sort of
situation is recursive. Depending on what your function is doing, it
might or might not be an infinite recursion, so I don't think I'd want
the system arbitrarily preventing you from doing this sort of thing.

Perhaps there should be an upper bound on function-call recursion depth
enforced someplace? Not sure.

regards, tom lane