transaction timeout

Started by Dr NoNameover 20 years ago30 messagesgeneral
Jump to latest
#1Dr NoName
spamacct11@yahoo.com

Hi all,

I have been quite satisfied with the level of support
from the PostgreSQL community, but this time I'm
getting nothing. So, is transaction timeout option
planned at all? What's the alternative solution to a
client that's hung in transaction?

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dr NoName (#1)
Re: transaction timeout

On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName wrote:

So, is transaction timeout option planned at all?

What's the alternative solution to a client that's hung in
transaction?

Forcibly end it?

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

#3Dr NoName
spamacct11@yahoo.com
In reply to: Alvaro Herrera (#2)
Re: transaction timeout

Yeah, that's what we have to resort to now, but that's
not a solution. Until we kill the client, the entire
database is locked (or, at least the tables that other
clients need to write to, which is effectively the
same thing). This is annoying enough during the week
but it's especially a problem on weekends when none of
the developers are in the office.

A single client should not be able to bring the entire
database down. The DB should recognize that the client
went down and roll back the transaction. That would be
the ideal solution. Anything else we can do to remedy
the situation?

thanks,

Eugene

--- Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName
wrote:

So, is transaction timeout option planned at all?

What's the alternative solution to a client that's

hung in

transaction?

Forcibly end it?

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"I dream about dreams about dreams", sang the
nightingale
under the pale moon (Sandman)

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#4Michael Fuhr
mike@fuhr.org
In reply to: Dr NoName (#1)
Re: transaction timeout

On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName wrote:

I have been quite satisfied with the level of support
from the PostgreSQL community, but this time I'm
getting nothing.

There have been a couple of replies to your post, although perhaps
not what you were hoping for:

http://archives.postgresql.org/pgsql-general/2005-07/msg00984.php
http://archives.postgresql.org/pgsql-general/2005-07/msg00985.php

So, is transaction timeout option planned at all?

A search through the archives shows that transaction timeout has
been discussed recently in pgsql-hackers, but unless I've missed
something it it hasn't been implemented yet (and therefore probably
won't be available in 8.1 since it's in feature freeze).

What's the alternative solution to a client that's hung in transaction?

What's the client doing that takes locks strong enough to "lock up
the entire database"? Why does the client hang? Since the database
doesn't currently have a way to detect and handle these situations,
it might be worthwhile to find out what's happening to see if it
can be prevented.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#5Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Dr NoName (#1)
Re: transaction timeout

On Tue, 2005-07-26 at 09:40, Dr NoName wrote:

Hi all,

I have been quite satisfied with the level of support
from the PostgreSQL community, but this time I'm
getting nothing. So, is transaction timeout option
planned at all? What's the alternative solution to a
client that's hung in transaction?

The common view on this kind of thing is that if your client is broken,
you need to fix it.

That said, I have seen some folks post about writing a perl or shell
script that runs every x minutes looking for connections that have been
idle for > a certain amount of time and kill the backend associated with
it (sigterm, not -9...)

Generally speaking, I'd rather code my own solution to these problems
than do it the way Oracle does. Often times the choices someone else
makes for you in fixing these problems are suboptimal for your system.

#6Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Dr NoName (#3)
Re: transaction timeout

On Tue, 2005-07-26 at 10:33, Dr NoName wrote:

Yeah, that's what we have to resort to now, but that's
not a solution. Until we kill the client, the entire
database is locked (or, at least the tables that other
clients need to write to, which is effectively the
same thing). This is annoying enough during the week
but it's especially a problem on weekends when none of
the developers are in the office.

OK, for the third or fourth time, what kind of locks is your application
taking out that can lock the whole database?

A single client should not be able to bring the entire
database down.

A single client running a large unconstrained join can easily bring both
postgresql or Oracle to its knees. Their very nature, of handling
hundreds of users accessing large amounts of data makes databases prone
to such problems, and requires you to carefully design your applications
so as not to do things that cause the database to hiccup.

The DB should recognize that the client
went down and roll back the transaction.

How, exactly, can PostgreSQL (or any other database) recognize a hung
client versus one that's waiting for an hour on user input?

That would be
the ideal solution. Anything else we can do to remedy
the situation?

Yes, tell us what you're doing that "locks the whole database".

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dr NoName (#3)
Re: transaction timeout

On Tue, Jul 26, 2005 at 08:33:19AM -0700, Dr NoName wrote:

A single client should not be able to bring the entire
database down. The DB should recognize that the client
went down and roll back the transaction. That would be
the ideal solution. Anything else we can do to remedy
the situation?

Now wait just a second. The database is not down at all just because
somebody left a transaction open. The real problem is that that open
transaction is having some resources locked, right?

I guess the real answer is not to leave transactions open. If you do
that by design, say because the app shows a data modification window,
and keeps a transaction open just to be able to save the changes later,
then you really need to rethink your app design.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

#8Dr NoName
spamacct11@yahoo.com
In reply to: Michael Fuhr (#4)
Re: transaction timeout

What's the client doing that takes locks strong
enough to "lock up
the entire database"? Why does the client hang?

yeah, good question. I thought postgres uses
better-than-row-level locking? Could the total
deadlock be caused by a combination of an open
transaction and VACUUM FULL that runs every sunday?

Since the database
doesn't currently have a way to detect and handle
these situations,
it might be worthwhile to find out what's happening
to see if it
can be prevented.

*Anything* can happen. Like, for instance, last week a
user tried to kill the client and only managed to kill
some of the threads. But since the process was not
fully dead, the socket was not closed, so transaction
was still in progress. The point is that the client
must not to be trusted to always do the right thing.
That's why we have things like protected memory,
pre-emptive multitasking, resource limits, etc.
Similarly a database must have the ability to detect a
broken client and kick it out.

thanks,

Eugene

____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs

#9Dr NoName
spamacct11@yahoo.com
In reply to: Alvaro Herrera (#7)
Re: transaction timeout

On Tue, Jul 26, 2005 at 08:33:19AM -0700, Dr NoName
wrote:

A single client should not be able to bring the

entire

database down. The DB should recognize that the

client

went down and roll back the transaction. That

would be

the ideal solution. Anything else we can do to

remedy

the situation?

Now wait just a second. The database is not down at
all just because
somebody left a transaction open. The real problem
is that that open
transaction is having some resources locked, right?

right, but that's effectively the same thing: users
cannot write to the database and in some cases can't
even read from it.

I guess the real answer is not to leave transactions
open. If you do
that by design, say because the app shows a data
modification window,
and keeps a transaction open just to be able to save
the changes later,
then you really need to rethink your app design.

There is no user interaction in the middle of a
transaction. But there are other things we have to do
(file system I/O, heavy processing, etc.) Those
operations really do need to be interleaved with the
DB writes.

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#10Dr NoName
spamacct11@yahoo.com
In reply to: Scott Marlowe (#6)
Re: transaction timeout

OK, for the third or fourth time, what kind of locks
is your application
taking out that can lock the whole database?

I'd like to know that myself. How can a
select/inser/update lock an entire table or even
multiple tables?

How, exactly, can PostgreSQL (or any other database)
recognize a hung
client versus one that's waiting for an hour on user
input?

transaction timeout. In our application, one hour-long
transaction is not normal. I want it to abort
automatically.

Yes, tell us what you're doing that "locks the whole
database".

I wish I knew. Last sunday it was locked so bad that
even selects were blocked until we killed the
offending client.

Eugene

____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs

#11Dr NoName
spamacct11@yahoo.com
In reply to: Scott Marlowe (#5)
Re: transaction timeout

The common view on this kind of thing is that if
your client is broken,
you need to fix it.

The problem is, we can't fix the users, nor can we fix
other software that our client has to interact with.
There will always be occasional situations when a
client gets stuck.

That said, I have seen some folks post about writing
a perl or shell
script that runs every x minutes looking for
connections that have been
idle for > a certain amount of time and kill the
backend associated with
it (sigterm, not -9...)

what are the implications of killing a postmaster
process?

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#12Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Dr NoName (#11)
Re: transaction timeout

On Tue, 2005-07-26 at 11:24, Dr NoName wrote:

The common view on this kind of thing is that if
your client is broken,
you need to fix it.

The problem is, we can't fix the users, nor can we fix
other software that our client has to interact with.
There will always be occasional situations when a
client gets stuck.

That said, I have seen some folks post about writing
a perl or shell
script that runs every x minutes looking for
connections that have been
idle for > a certain amount of time and kill the
backend associated with
it (sigterm, not -9...)

what are the implications of killing a postmaster
process?

A Sigterm is generally considered safe. It's -9 and its ilk that you
need to be wary of.

I think we and you both need more information about this failure. Do
you have any logging turned on that could give us a clue to what's
causing this failure?

It sounds to me more like one of the user apps is DOSing the server with
large unconstrained joins or something similarly dangerous to do.

#13Frank L. Parks
fparks@ezbizpartner.com
In reply to: Dr NoName (#11)
Re: transaction timeout

TIP 2: Don't 'kill -9' the postmaster

Dr NoName wrote:

Show quoted text

The common view on this kind of thing is that if
your client is broken,
you need to fix it.

The problem is, we can't fix the users, nor can we fix
other software that our client has to interact with.
There will always be occasional situations when a
client gets stuck.

That said, I have seen some folks post about writing
a perl or shell
script that runs every x minutes looking for
connections that have been
idle for > a certain amount of time and kill the
backend associated with
it (sigterm, not -9...)

what are the implications of killing a postmaster
process?

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#14Magnus Hagander
magnus@hagander.net
In reply to: Frank L. Parks (#13)
Re: transaction timeout

That said, I have seen some folks post about writing a

perl or shell

script that runs every x minutes looking for connections

that have

been idle for > a certain amount of time and kill the backend
associated with it (sigterm, not -9...)

what are the implications of killing a postmaster process?

A Sigterm is generally considered safe. It's -9 and its ilk
that you need to be wary of.

No it's not. See the archives.
The only *safe* way to do it ATM is to restart the database. SIGTERM may
leave orphaned locks or such things in the system. (Incidentally, -9 on
a single backend should be safe I believe. The postmaster will tell all
concurrent connections to abort and restart. It's not nice, but it
should be safe - should perform onrmal recovery same as if you pull the
plug)

//Magnus

#15Dr NoName
spamacct11@yahoo.com
In reply to: Magnus Hagander (#14)
Re: transaction timeout

If you have second database in the cluster is it
still operational when
the main database locks up?

we don't have a DB cluster. It would be pretty useless
since postgresql doesn't support distributed
transactions.

Also it seems that some diagnostics are needed in
the client app to log
the crash event so you can determine which SQL
commands are causing the
lock.

I'll try to get that next time it happens. But
regardless of sql commands are running, I know what
the root cause is: a client hangs while in
transaction.

Despite many years of writing buggy code I have not
yet locked a whole
DB in the fashion described. I can not see how a
simple select / insert
/ update command sequence can achieve it unless
there is a particular
relation between the tables involved.

As I have already said, I suspect this might be caused
by a combination of an open transaction and a weekly
VACUUM FULL. Does that sound right?

If the tables are related / linked via rules /
triggers/ keys then
perhaps add a test table that bears no relation to
the others and see if
it is locked when the others appear to have this
problem you are describing.

Perhaps a simple test : When the DB error occurs can
you use PGAdmin to
read an independent table, or read from another
database.

thanks, I'll try that.

Eugene

____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs

#16Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Magnus Hagander (#14)
Re: transaction timeout

On Tue, 2005-07-26 at 12:41, Magnus Hagander wrote:

That said, I have seen some folks post about writing a

perl or shell

script that runs every x minutes looking for connections

that have

been idle for > a certain amount of time and kill the backend
associated with it (sigterm, not -9...)

what are the implications of killing a postmaster process?

A Sigterm is generally considered safe. It's -9 and its ilk
that you need to be wary of.

No it's not. See the archives.
The only *safe* way to do it ATM is to restart the database. SIGTERM may
leave orphaned locks or such things in the system. (Incidentally, -9 on
a single backend should be safe I believe. The postmaster will tell all
concurrent connections to abort and restart. It's not nice, but it
should be safe - should perform onrmal recovery same as if you pull the
plug)

Really? I was under the impression that doing a "kill <backendpid>" on
an idle connection would clean up those things. Was that a discussion
on hackers that brought this up? And if so, what was the time period,
I'd like to read through it.

#17Magnus Hagander
magnus@hagander.net
In reply to: Scott Marlowe (#16)
Re: transaction timeout

That said, I have seen some folks post about writing a

perl or shell

script that runs every x minutes looking for connections

that have

been idle for > a certain amount of time and kill the backend
associated with it (sigterm, not -9...)

what are the implications of killing a postmaster process?

A Sigterm is generally considered safe. It's -9 and its ilk that
you need to be wary of.

No it's not. See the archives.
The only *safe* way to do it ATM is to restart the

database. SIGTERM

may leave orphaned locks or such things in the system.

(Incidentally,

-9 on a single backend should be safe I believe. The

postmaster will

tell all concurrent connections to abort and restart. It's

not nice,

but it should be safe - should perform onrmal recovery same

as if you

pull the
plug)

Really? I was under the impression that doing a "kill
<backendpid>" on an idle connection would clean up those
things. Was that a discussion on hackers that brought this
up? And if so, what was the time period, I'd like to read through it.

It's been brought up several times on hackers. Once back last autumn
when looking at the pg_terminate_backend function, and it was brought up
again this spring I think during one of the discussions about the
instrumentation patch.

I can't find the actual discussion at a quick look, but the quote from
Tom on http://archives.postgresql.org/pgsql-hackers/2005-06/msg00970.php
certainly shows there is a problem :-) The discussion should be
somewhere around that timeframe (I think before it - before I came up
with the idea to solve it that didn't work)

//Magnus

#18Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Dr NoName (#15)
Re: transaction timeout

On Tue, 2005-07-26 at 12:51, Dr NoName wrote:

If you have second database in the cluster is it
still operational when
the main database locks up?

we don't have a DB cluster. It would be pretty useless
since postgresql doesn't support distributed
transactions.

You misunderstood his point. In PostgreSQL parlance, a "cluster" is a
single postmaster running on a single machine, with 1 or more
databases. So, what he wanted to know was, if your application is
hitting a database called fred, and you have a spare database named
wilma, would "psql wilma" work when the database is "locked up?"

Can you elaborate on what you mean by a database that is "locked up?"

Also it seems that some diagnostics are needed in
the client app to log
the crash event so you can determine which SQL
commands are causing the
lock.

I'll try to get that next time it happens. But
regardless of sql commands are running, I know what
the root cause is: a client hangs while in
transaction.

Actually, unless the client is holding a table level exclusive lock,
most transactions should continue.

Despite many years of writing buggy code I have not
yet locked a whole
DB in the fashion described. I can not see how a
simple select / insert
/ update command sequence can achieve it unless
there is a particular
relation between the tables involved.

As I have already said, I suspect this might be caused
by a combination of an open transaction and a weekly
VACUUM FULL. Does that sound right?

No, VAcuum full shouldn't cause this kind of issue. Now, if the
database is just running real slow, instead of actually locking up,
that's possible with vacuum full.

#19Dr NoName
spamacct11@yahoo.com
In reply to: Scott Marlowe (#18)
Re: transaction timeout

You misunderstood his point. In PostgreSQL
parlance, a "cluster" is a
single postmaster running on a single machine, with
1 or more
databases. So, what he wanted to know was, if your
application is
hitting a database called fred, and you have a spare
database named
wilma, would "psql wilma" work when the database is
"locked up?"

ok, I see. That's another thing to try next time.

Can you elaborate on what you mean by a database
that is "locked up?"

Here is the ps output from before the offending
process was killed. That one "idle in transaction"
process caused everything to lock.

2005 07 25 6:07:34 s17 79 > ps -efwww | grep
postgres
postgres 23281 1 0 Jul18 ? 00:00:29
/usr/bin/postmaster -p 5432
postgres 23285 23281 0 Jul18 ? 00:20:20
postgres: stats buffer process
postgres 23287 23285 0 Jul18 ? 00:18:08
postgres: stats collector process
postgres 12466 23281 0 Jul18 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 12474 23281 0 Jul18 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26947 23281 0 Jul19 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 3514 23281 0 Jul19 ? 00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres 6881 23281 0 Jul19 ? 00:00:01
postgres: siam siam_production 172.16.11.71 idle
postgres 17750 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.10.159 idle in
transaction
postgres 26504 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres 12284 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 16026 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 25709 23281 0 Jul21 ? 00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27980 23281 0 Jul21 ? 00:04:08
postgres: siam siam_production 172.16.1.17 idle
postgres 14854 23281 0 Jul21 ? 00:00:03
postgres: siam siam_production 172.16.11.95 idle
postgres 19531 23281 0 Jul21 ? 00:00:02
postgres: siam siam_production 172.16.11.95 idle
postgres 17590 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26917 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26933 23281 0 Jul22 ? 00:06:57
postgres: siam siam_production 172.16.1.17 idle
postgres 26934 23281 0 Jul22 ? 00:00:56
postgres: siam siam_production 172.16.1.17 idle
postgres 26939 23281 0 Jul22 ? 00:01:09
postgres: siam siam_production 172.16.1.17 idle
postgres 27362 23281 0 Jul22 ? 00:01:56
postgres: siam siam_production 172.16.1.17 idle
postgres 27365 23281 0 Jul22 ? 00:01:03
postgres: siam siam_production 172.16.1.17 idle
postgres 27398 23281 0 Jul22 ? 00:00:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27856 23281 0 Jul22 ? 00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27858 23281 0 Jul22 ? 00:05:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27863 23281 0 Jul22 ? 00:00:58
postgres: siam siam_production 172.16.1.17 idle
postgres 27865 23281 0 Jul22 ? 00:01:28
postgres: siam siam_production 172.16.1.17 idle
postgres 27869 23281 0 Jul22 ? 00:00:29
postgres: siam siam_production 172.16.1.17 idle
postgres 28295 23281 0 Jul22 ? 00:00:23
postgres: siam siam_production 172.16.1.17 idle
postgres 28313 23281 0 Jul22 ? 00:00:45
postgres: siam siam_production 172.16.1.17 idle
postgres 28315 23281 0 Jul22 ? 00:01:06
postgres: siam siam_production 172.16.1.17 idle
postgres 28725 23281 0 Jul22 ? 00:05:07
postgres: siam siam_production 172.16.1.17 idle
postgres 13559 23281 0 Jul22 ? 00:00:24
postgres: siam siam_production 172.16.1.17 idle
postgres 13595 23281 0 Jul22 ? 00:00:36
postgres: siam siam_production 172.16.1.17 idle
postgres 14017 23281 0 Jul22 ? 00:00:52
postgres: siam siam_production 172.16.1.17 idle
postgres 25206 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.11.146 idle
postgres 3742 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.142 idle
postgres 12016 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 13782 23281 0 Jul22 ? 00:00:13
postgres: siam siam_production 172.16.10.106 idle
postgres 13853 23281 0 Jul22 ? 00:00:02
postgres: siam siam_production 172.16.10.106 idle
postgres 14381 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 14923 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 17181 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.11.59 idle
postgres 6212 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production [local] VACUUM waiting
postgres 5952 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.1.17 SELECT
waiting
postgres 24644 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.100 SELECT
waiting
postgres 26271 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 26720 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 26721 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 27161 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 27162 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 28005 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 28450 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 28451 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 3049 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.134 SELECT
waiting
postgres 3875 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.134 SELECT
waiting
postgres 4286 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.130 SELECT
waiting
postgres 4700 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.185 SELECT
waiting
postgres 13850 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13851 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13852 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13854 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13855 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13856 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.145 SELECT
waiting
postgres 14268 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 14269 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 14270 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.138 SELECT
waiting
postgres 14685 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 14686 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.24 SELECT
waiting
postgres 15100 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 15951 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.125 SELECT
waiting
postgres 16367 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.109 SELECT
waiting
postgres 25054 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 25920 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 25921 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.157 SELECT
waiting
postgres 25922 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 26337 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 26338 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 10948 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.105 SELECT
waiting
postgres 12195 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.187 SELECT
waiting
postgres 12196 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.11.105 SELECT
waiting
postgres 22691 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.1.17 SELECT
waiting
postgres 23059 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.121 SELECT
waiting
postgres 15209 23281 0 04:00 ? 00:00:00
postgres: siam siam_production [local] SELECT waiting

Here is the ps output immediately after the hung
client was killed. As you can see, a whole shitload of
SELECTs suddenly woke up and finished. Unfortunately,
I don't have pg_stat_activity output.

2005 07 25 6:14:41 s17 78 > ps -efwww | grep postgres
postgres 23281 1 0 Jul18 ? 00:00:29
/usr/bin/postmaster -p 5432
postgres 23285 23281 0 Jul18 ? 00:20:21
postgres: stats buffer process
postgres 23287 23285 0 Jul18 ? 00:18:08
postgres: stats collector process
postgres 12466 23281 0 Jul18 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 12474 23281 0 Jul18 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26947 23281 0 Jul19 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 3514 23281 0 Jul19 ? 00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres 6881 23281 0 Jul19 ? 00:00:01
postgres: siam siam_production 172.16.11.71 idle
postgres 26504 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres 12284 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 16026 23281 0 Jul20 ? 00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 25709 23281 0 Jul21 ? 00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27980 23281 0 Jul21 ? 00:04:08
postgres: siam siam_production 172.16.1.17 idle
postgres 14854 23281 0 Jul21 ? 00:00:03
postgres: siam siam_production 172.16.11.95 idle
postgres 19531 23281 0 Jul21 ? 00:00:02
postgres: siam siam_production 172.16.11.95 idle
postgres 17590 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26917 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26933 23281 0 Jul22 ? 00:06:57
postgres: siam siam_production 172.16.1.17 idle
postgres 26934 23281 0 Jul22 ? 00:00:56
postgres: siam siam_production 172.16.1.17 idle
postgres 26939 23281 0 Jul22 ? 00:01:09
postgres: siam siam_production 172.16.1.17 idle
postgres 27362 23281 0 Jul22 ? 00:01:56
postgres: siam siam_production 172.16.1.17 idle
postgres 27365 23281 0 Jul22 ? 00:01:03
postgres: siam siam_production 172.16.1.17 idle
postgres 27398 23281 0 Jul22 ? 00:00:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27856 23281 0 Jul22 ? 00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27858 23281 0 Jul22 ? 00:05:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27863 23281 0 Jul22 ? 00:00:58
postgres: siam siam_production 172.16.1.17 idle
postgres 27865 23281 0 Jul22 ? 00:01:28
postgres: siam siam_production 172.16.1.17 idle
postgres 27869 23281 0 Jul22 ? 00:00:29
postgres: siam siam_production 172.16.1.17 idle
postgres 28295 23281 0 Jul22 ? 00:00:23
postgres: siam siam_production 172.16.1.17 idle
postgres 28313 23281 0 Jul22 ? 00:00:45
postgres: siam siam_production 172.16.1.17 idle
postgres 28315 23281 0 Jul22 ? 00:01:06
postgres: siam siam_production 172.16.1.17 idle
postgres 28725 23281 0 Jul22 ? 00:05:07
postgres: siam siam_production 172.16.1.17 idle
postgres 13559 23281 0 Jul22 ? 00:00:24
postgres: siam siam_production 172.16.1.17 idle
postgres 13595 23281 0 Jul22 ? 00:00:36
postgres: siam siam_production 172.16.1.17 idle
postgres 14017 23281 0 Jul22 ? 00:00:52
postgres: siam siam_production 172.16.1.17 idle
postgres 25206 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.11.146 idle
postgres 3742 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.142 idle
postgres 12016 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 13782 23281 0 Jul22 ? 00:00:13
postgres: siam siam_production 172.16.10.106 idle
postgres 13853 23281 0 Jul22 ? 00:00:02
postgres: siam siam_production 172.16.10.106 idle
postgres 14381 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 14923 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 17181 23281 0 Jul22 ? 00:00:00
postgres: siam siam_production 172.16.11.59 idle
postgres 6212 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production [local] VACUUM waiting
postgres 5952 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 24644 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.100 idle
postgres 26721 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 idle
postgres 27161 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.10.114 idle
postgres 22691 23281 0 Jul24 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 15209 23281 0 03:59 ? 00:00:10
postgres: siam siam_production [local] COPY
postgres 26975 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26976 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26977 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26978 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26979 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26980 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26981 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26982 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26983 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26984 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26985 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26986 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26987 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26988 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26989 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26990 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.1.17 SELECT
waiting
postgres 27041 23281 0 06:14 ? 00:00:00
postgres: siam siam_production 172.16.11.130 SELECT
waiting
costa 27091 26473 0 06:14 pts/0 00:00:00 grep
postgres

No, VAcuum full shouldn't cause this kind of issue.
Now, if the
database is just running real slow, instead of
actually locking up,
that's possible with vacuum full.

no, there was *zero* load on the server.

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Marlowe (#16)
Re: transaction timeout

Scott Marlowe <smarlowe@g2switchworks.com> writes:

On Tue, 2005-07-26 at 12:41, Magnus Hagander wrote:

The only *safe* way to do it ATM is to restart the database. SIGTERM may
leave orphaned locks or such things in the system.

Really? I was under the impression that doing a "kill <backendpid>" on
an idle connection would clean up those things.

Well, it ought to, but I for one don't consider that code path
adequately tested --- and we have seen at least one report (from Rod
Taylor if memory serves) suggesting that there are in fact bugs in it.

We know that SIGTERM'ing all the backends at once leaves the database
with a good state on disk; that path is tested everytime someone shuts
down Postgres. It does not follow that SIGTERM'ing a single backend
leaves consistent state in shared memory. Rod's report suggested a
corrupt lock table in particular.

Was that a discussion on hackers that brought this up? And if so,
what was the time period, I'd like to read through it.

There was a flamewar late during the 8.0 devel cycle where someone was
pushing to add a SIGTERM-one-backend function, and I was demurring that
I didn't think it was adequately tested. Subsequent events seem to have
proven that fear correct.

Eventually I'm sure we'll find and fix the problem, but at the moment
it's a risky thing to do.

regards, tom lane

#21Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#20)
#22Dr NoName
spamacct11@yahoo.com
In reply to: Dr NoName (#19)
#23Paul Tillotson
spam1011@adelphia.net
In reply to: Dr NoName (#8)
#24Dr NoName
spamacct11@yahoo.com
In reply to: Paul Tillotson (#23)
#25Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Dr NoName (#24)
#26Dr NoName
spamacct11@yahoo.com
In reply to: Scott Marlowe (#25)
#27Paul Tillotson
spam1011@adelphia.net
In reply to: Dr NoName (#24)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dr NoName (#26)
#29Mike Fahrenkrog
mikef@travelpost.com
In reply to: Scott Marlowe (#18)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike Fahrenkrog (#29)