total db lockup
(NOTE: reposting this for the *fifth* time because my previous messages didn't go through).
Hi all,
We have experienced a really weird problem with
postgresql yesterday. When I was called in to take a
look, all the non-superuser connections were used up
and they were all in a waiting state (SELECT waiting,
UPDATE waiting, etc.). I couldn't figure out what the
problem is, but I saved the ps, pg_stat_activity, and
pg_locks state at the time (attached as
condor_db_stats.txt). BTW, when looking at the queries
in pg_stat_activity, postgresql cuts them off so you
can't see the entire string. Is there any way to
prevent that, or at least increase the character
limit?
We restarted postgresql server and the problem
reappeared a few minutes later. I saved that state
also (condor_db_stats2.txt). At that point, after we
restarted postgresql (again), I ran VACUUM on the
entire database and did a few more things that seem to
have solved the problem (see below).
This particular database is essentially just one flat
table (level) with a few small supporting tables. Only
the level table is heavily used. (table definition is
attached as table.txt). I noticed that one of the
indexes (level_owner_index) was a hash index. I
remembered what postgresql manual says about hash
indexes and concurrency
(http://www.postgresql.org/docs/7.4/interactive/locking-indexes.html)
and, after VACUUM finished, replaced the hash index
with a btree. I then did a REINDEX of the level table
and ANALYZE. This seems to have solved the problem --
at least as of this morning we still have not seen any
deadlocks.
My question is, what could have caused this to happen?
Can anyone explain this paragraph from the manual:
"Share/exclusive page-level locks are used for
read/write access. Locks are released after the page
is processed. Page-level locks provide better
concurrency than index-level ones but are liable to
deadlocks."
Any other pointers to help me figure out what went
wrong and how to fix it?
thanks,
Eugene
WTF? My message doesn't appear. Trying again without
attachments or inline text.
So can anyone offer any insight on this? BTW, I tried reposting it with attachments and it didn't show up.
thanks,
Eugene
Show quoted text
From: <eugene1@sympatico.ca>
Date: 2005/08/18 Thu AM 09:24:30 EST
To: <pgsql-general@postgresql.org>
Subject: [GENERAL] total db lockup(NOTE: reposting this for the *fifth* time because my previous messages didn't go through).
Hi all,
We have experienced a really weird problem with
postgresql yesterday. When I was called in to take a
look, all the non-superuser connections were used up
and they were all in a waiting state (SELECT waiting,
UPDATE waiting, etc.). I couldn't figure out what the
problem is, but I saved the ps, pg_stat_activity, and
pg_locks state at the time (attached as
condor_db_stats.txt). BTW, when looking at the queries
in pg_stat_activity, postgresql cuts them off so you
can't see the entire string. Is there any way to
prevent that, or at least increase the character
limit?We restarted postgresql server and the problem
reappeared a few minutes later. I saved that state
also (condor_db_stats2.txt). At that point, after we
restarted postgresql (again), I ran VACUUM on the
entire database and did a few more things that seem to
have solved the problem (see below).This particular database is essentially just one flat
table (level) with a few small supporting tables. Only
the level table is heavily used. (table definition is
attached as table.txt). I noticed that one of the
indexes (level_owner_index) was a hash index. I
remembered what postgresql manual says about hash
indexes and concurrency
(http://www.postgresql.org/docs/7.4/interactive/locking-indexes.html)
and, after VACUUM finished, replaced the hash index
with a btree. I then did a REINDEX of the level table
and ANALYZE. This seems to have solved the problem --
at least as of this morning we still have not seen any
deadlocks.My question is, what could have caused this to happen?
Can anyone explain this paragraph from the manual:"Share/exclusive page-level locks are used for
read/write access. Locks are released after the page
is processed. Page-level locks provide better
concurrency than index-level ones but are liable to
deadlocks."Any other pointers to help me figure out what went
wrong and how to fix it?thanks,
Eugene
WTF? My message doesn't appear. Trying again without
attachments or inline text.---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Import Notes
Resolved by subject fallback
<eugene1@sympatico.ca> writes:
So can anyone offer any insight on this?
Not without information. You haven't even told us what PG version you
are running, much less provided the necessary details like the pg_locks
status.
BTW, I tried reposting it with attachments and it didn't show up.
Fix your mail setup and try again. Or wait a bit --- it's quite
possible the previous messages are just stuck in the moderator's
approval queue.
regards, tom lane
It seems that I can't. For whatever reason, my messages are being blocked.
Eugene
Show quoted text
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2005/08/18 Thu PM 02:11:54 EST
To: eugene1@sympatico.ca
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] total db lockup
Attachments:
replyAlltext/plain; name=replyAllDownload
Import Notes
Resolved by subject fallback
Guys, I really need help on this. Can whoever is in charge of the mailing list change it so that it doesn't delete my messages?
thanks,
Eugene
Show quoted text
From: Eugene <eugene1@sympatico.ca>
Date: 2005/08/18 Thu PM 04:45:58 EST
To: Tom Lane <tgl@sss.pgh.pa.us>
CC: <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] total db lockupIt seems that I can't. For whatever reason, my messages are being blocked.
Eugene
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2005/08/18 Thu PM 02:11:54 EST
To: eugene1@sympatico.ca
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] total db lockup
Import Notes
Resolved by subject fallback
Well, perhaps they were blocked for being too large?
Seriously, some of your messages appear to be getting through fine so
if you can't attach them put them on a web or ftp server and post a
link. Much better than copying it to several hundred mailboxes.
On Thu, Aug 18, 2005 at 05:20:55PM -0400, Eugene wrote:
Guys, I really need help on this. Can whoever is in charge of the
mailing list change it so that it doesn't delete my messages?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
On Thu, Aug 18, 2005 at 05:20:55PM -0400, Eugene wrote:
Guys, I really need help on this. Can whoever is in charge of the
mailing list change it so that it doesn't delete my messages?
I am not in charge of the mail server. However: May I suggest you
change to a less broken mail client? Your messages show up featuring
really strange MIME content description. Even the standard gmail.com
accounts seem to work better, which is surprising for a web based email
setup. This brokenness may hint the spam processor on postgresql.org
servers to silently droo your emails.
Alternatively, you may want to "paste" your error messages and stuff on
places like http://rafb.net/paste, and then post the URLs here.
--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hern�ndez-Novich)
Thanks Alvaro.
Here it is again:
problem description: http://rafb.net/paste/results/bLAtIk26.html
db state before first restart: http://rafb.net/paste/results/D1Bqe125.html
db state before second restart: http://rafb.net/paste/results/D1Bqe125.html
table definition: http://rafb.net/paste/results/W35ccD49.html
thanks,
Eugene
Show quoted text
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: 2005/08/18 Thu PM 08:52:17 EST
To: Eugene <eugene1@sympatico.ca>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] total db lockupOn Thu, Aug 18, 2005 at 05:20:55PM -0400, Eugene wrote:
Guys, I really need help on this. Can whoever is in charge of the
mailing list change it so that it doesn't delete my messages?I am not in charge of the mail server. However: May I suggest you
change to a less broken mail client? Your messages show up featuring
really strange MIME content description. Even the standard gmail.com
accounts seem to work better, which is surprising for a web based email
setup. This brokenness may hint the spam processor on postgresql.org
servers to silently droo your emails.Alternatively, you may want to "paste" your error messages and stuff on
places like http://rafb.net/paste, and then post the URLs here.--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)
Import Notes
Resolved by subject fallback
On Fri, Aug 19, 2005 at 10:54:35AM -0400, Eugene wrote:
Thanks Alvaro.
Here it is again:
problem description: http://rafb.net/paste/results/bLAtIk26.html
db state before first restart: http://rafb.net/paste/results/D1Bqe125.html
db state before second restart: http://rafb.net/paste/results/D1Bqe125.html
table definition: http://rafb.net/paste/results/W35ccD49.html
Ok, so it seems the lockup occured only with the hash indexes? Then it
means we still have bugs in the locking code for those. It doesn't
surprise me. There's a reason they are not recommended, you know?
Let us know if you find the problem showing up again with btree indexes.
The hash indexes bugs should be fixed, but they are not high priority ...
--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"If it wasn't for my companion, I believe I'd be having
the time of my life" (John Dunbar)
actually, I'm using postgresql 7.3.2. I noticed there were some hash index fixes in 7.4, so that might be it. Would there be any other reason for this deadlock? This database has been in production for almost 2 years and this is the first time we've seen the problem.
thanks,
Eugene
Show quoted text
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: 2005/08/19 Fri AM 11:22:04 EST
To: Eugene <eugene1@sympatico.ca>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] total db lockupOn Fri, Aug 19, 2005 at 10:54:35AM -0400, Eugene wrote:
Thanks Alvaro.
Here it is again:
problem description: http://rafb.net/paste/results/bLAtIk26.html
db state before first restart: http://rafb.net/paste/results/D1Bqe125.html
db state before second restart: http://rafb.net/paste/results/D1Bqe125.html
table definition: http://rafb.net/paste/results/W35ccD49.htmlOk, so it seems the lockup occured only with the hash indexes? Then it
means we still have bugs in the locking code for those. It doesn't
surprise me. There's a reason they are not recommended, you know?Let us know if you find the problem showing up again with btree indexes.
The hash indexes bugs should be fixed, but they are not high priority ...--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"If it wasn't for my companion, I believe I'd be having
the time of my life" (John Dunbar)
Import Notes
Resolved by subject fallback
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Ok, so it seems the lockup occured only with the hash indexes? Then it
means we still have bugs in the locking code for those.
"Still" meaning "in the version he's using", which he hasn't told us
anywhere that I saw. (Internal evidence suggests it's 7.3 something)
It doesn't surprise me.
A deadlock in hash indexes wouldn't be surprising in pre-7.4 code,
since 7.4 was the first version that even pretended to be deadlock free
in hash indexes. But it does seem pretty surprising that the deadlock
checker didn't fire and boot somebody out of the deadlock. We haven't
had any bugs reported in deadlock detection in a very long time (since
7.1 I think). This could mean there's still something wrong in there.
regards, tom lane