pgsql: Repair two places where SIGTERM exit could leave shared memory
Log Message:
-----------
Repair two places where SIGTERM exit could leave shared memory state
corrupted. (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.) To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook. Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.
Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
Modified Files:
--------------
pgsql/src/backend/access/nbtree:
nbtree.c (r1.158 -> r1.159)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c?r1=1.158&r2=1.159)
nbtutils.c (r1.88 -> r1.89)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtutils.c?r1=1.88&r2=1.89)
pgsql/src/backend/access/transam:
xlog.c (r1.296 -> r1.297)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.296&r2=1.297)
pgsql/src/backend/commands:
dbcommands.c (r1.205 -> r1.206)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/dbcommands.c?r1=1.205&r2=1.206)
pgsql/src/backend/port:
ipc_test.c (r1.24 -> r1.25)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/ipc_test.c?r1=1.24&r2=1.25)
pgsql/src/backend/storage/ipc:
ipc.c (r1.100 -> r1.101)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/ipc/ipc.c?r1=1.100&r2=1.101)
pgsql/src/include/access:
nbtree.h (r1.117 -> r1.118)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/nbtree.h?r1=1.117&r2=1.118)
pgsql/src/include/storage:
ipc.h (r1.74 -> r1.75)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/ipc.h?r1=1.74&r2=1.75)
pgsql/src/include/utils:
elog.h (r1.92 -> r1.93)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/elog.h?r1=1.92&r2=1.93)
Tom Lane wrote:
Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.
I wonder if we could use this mechanism for cleaning up in case of
failed CLUSTER, REINDEX or the like. I think these can leave dangling
files around.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
Tom Lane wrote:
Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.I wonder if we could use this mechanism for cleaning up in case of
failed CLUSTER, REINDEX or the like. I think these can leave dangling
files around.
They do clean up on abort or SIGTERM. If you experience a sudden power
loss, or kill -9 while CLUSTER or REINDEX is running, they will leave
behind dangling files, but that's a different problem. It's not limited
to utility commands like that either: if you create a table and copy a
few gigabytes of data into it in a transaction, and crash before
committing, you're left with a dangling file as well.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Alvaro Herrera wrote:
Tom Lane wrote:
Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.I wonder if we could use this mechanism for cleaning up in case of
failed CLUSTER, REINDEX or the like. I think these can leave dangling
files around.They do clean up on abort or SIGTERM.
Ah, we're OK then.
If you experience a sudden power loss, or kill -9 while CLUSTER or
REINDEX is running, they will leave behind dangling files, but that's
a different problem.
Sure, no surprises there.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
Heikki Linnakangas wrote:
Alvaro Herrera wrote:
Tom Lane wrote:
Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.I wonder if we could use this mechanism for cleaning up in case of
failed CLUSTER, REINDEX or the like. I think these can leave dangling
files around.They do clean up on abort or SIGTERM.
Ah, we're OK then.
Wait, my memory failed me! No, we don't clean up dangling files on
SIGTERM. We should...
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Alvaro Herrera wrote:
Heikki Linnakangas wrote:
Alvaro Herrera wrote:
Tom Lane wrote:
Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption
problem,
but SIGTERM abort of createdb could leave orphaned files lying around.I wonder if we could use this mechanism for cleaning up in case of
failed CLUSTER, REINDEX or the like. I think these can leave dangling
files around.They do clean up on abort or SIGTERM.
Ah, we're OK then.
Wait, my memory failed me! No, we don't clean up dangling files on
SIGTERM. We should...
No, wait, we do after all. I was fooled by the new 8.3 behavior to leave
the files dangling until next checkpoint. The files are not cleaned up
immediately on SIGTERM, but they are at the next checkpoint.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Thu, Apr 17, 2008 at 04:03:18PM +0300, Heikki Linnakangas wrote:
They do clean up on abort or SIGTERM. If you experience a sudden power
loss, or kill -9 while CLUSTER or REINDEX is running, they will leave
behind dangling files, but that's a different problem. It's not limited
to utility commands like that either: if you create a table and copy a
few gigabytes of data into it in a transaction, and crash before
committing, you're left with a dangling file as well.
Is this so? This happened to me the other day (hence the question about
having COPY note failure earlier) because the disk filled up. I was
confused because du showed nothing. Eventually I did an lsof and found
the postgres backend had a large number of open file handles to deleted
files (each one gigabyte).
So something certainly deletes them (though maybe not on windows?)
before the transaction ends.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
Martijn van Oosterhout <kleptog@svana.org> writes:
Is this so? This happened to me the other day (hence the question about
having COPY note failure earlier) because the disk filled up. I was
confused because du showed nothing. Eventually I did an lsof and found
the postgres backend had a large number of open file handles to deleted
files (each one gigabyte).
The backend, or the bgwriter? Please be specific.
The bgwriter should drop open file references after the next checkpoint,
but I don't recall any forcing function for regular backends to close
open files.
8.3 and HEAD should ftruncate() the first segment of a relation but I
think they just unlink the rest. Is it sane to think of ftruncate then
unlink on the non-first segments, to alleviate the disk-space issue when
someone else is holding the file open?
regards, tom lane
On Thu, Apr 17, 2008 at 11:48:41AM -0400, Tom Lane wrote:
Martijn van Oosterhout <kleptog@svana.org> writes:
Is this so? This happened to me the other day (hence the question about
having COPY note failure earlier) because the disk filled up. I was
confused because du showed nothing. Eventually I did an lsof and found
the postgres backend had a large number of open file handles to deleted
files (each one gigabyte).The backend, or the bgwriter? Please be specific.
I beleive the backend, because I was using lsof -p <pid> using the pid
copied from ps. But I can't be 100%.
8.3 and HEAD should ftruncate() the first segment of a relation but I
think they just unlink the rest. Is it sane to think of ftruncate then
unlink on the non-first segments, to alleviate the disk-space issue when
someone else is holding the file open?
It's possible. OTOH, if the copy error had been return in the
PQputline() the driving program (which has several COPYs running at
once) would have aborted and the data would have been reclaimed
immediately. As it is it kept going for an hour before noticing and
then dying (and cleaning everything up).
The one ftruncate does explain why there was some free space, so that
part is appreciated.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.