Re: Server instrumentation for 8.1

Started by Rod Taylorover 20 years ago2 messages
#1Rod Taylor
pg@rbt.ca

Exactly. In theory it probably works fine to allow one backend to exit
via kill -TERM, but it cannot be claimed that that behavior has been
tested to any significant extent --- "fast" shutdown is not stressing it
in the same way.

I think this is largely a question of someone doing a significant amount
of stress testing: gun live server processes with "kill -TERM" in an
active system, and keep an eye out for resource leaks, held locks, and
so on. It would be more convincing if the processes getting zapped are
executing a wide variety of SQL, too --- I'd not feel very confident
given only tests of killing, say, pgbench threads.

Cause I know you wont be satisfied with anecdotal evidence, I thought I would
just say that I have done kill's on specific backends in a high load OLTP
process, with 1000+ active connections, for years and not had a problem with
it yet.

Not that I wouldn't like to see some specific, thorough testing on the matter,
but I'm perfectly comfortable with the previously provided function.

I've also used it regularly for a few years with 100 active connections
in order to get rid of processes which were doing things they shouldn't
be, and have run into problems.

It seems about one out of every 20 kills of something holding a heavy
lock (VACUUM, ALTER TABLE, etc.) will result in a lock table corruption
being reported within the next few hours, although the pg_locks view
doesn't show anything interesting, nor do the locks appear to persist as
other processes can use the structures.

--

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Rod Taylor (#1)

Well, that's clear evidence that the only way we are going to be able to
SIGTERM a backend is it does a query cancel first, then terminates. I
don't think anything else is going to work cleanly.

---------------------------------------------------------------------------

Rod Taylor wrote:

Exactly. In theory it probably works fine to allow one backend to exit
via kill -TERM, but it cannot be claimed that that behavior has been
tested to any significant extent --- "fast" shutdown is not stressing it
in the same way.

I think this is largely a question of someone doing a significant amount
of stress testing: gun live server processes with "kill -TERM" in an
active system, and keep an eye out for resource leaks, held locks, and
so on. It would be more convincing if the processes getting zapped are
executing a wide variety of SQL, too --- I'd not feel very confident
given only tests of killing, say, pgbench threads.

Cause I know you wont be satisfied with anecdotal evidence, I thought I would
just say that I have done kill's on specific backends in a high load OLTP
process, with 1000+ active connections, for years and not had a problem with
it yet.

Not that I wouldn't like to see some specific, thorough testing on the matter,
but I'm perfectly comfortable with the previously provided function.

I've also used it regularly for a few years with 100 active connections
in order to get rid of processes which were doing things they shouldn't
be, and have run into problems.

It seems about one out of every 20 kills of something holding a heavy
lock (VACUUM, ALTER TABLE, etc.) will result in a lock table corruption
being reported within the next few hours, although the pg_locks view
doesn't show anything interesting, nor do the locks appear to persist as
other processes can use the structures.

--

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073