I want tips for debugging deadlocks

Started by Hannu Krosingover 25 years ago4 messageshackers
Jump to latest
#1Hannu Krosing
hannu@tm.ee

Hi,

I'm in a situation where I urgently need to debug PostgreSQL 7.0.2
for deadlocks that it does not notice/timeout

Where can I find info about running several concurrent backends
under a debugger ?

-----------
Hannu

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#1)
Re: I want tips for debugging deadlocks

Hannu Krosing <hannu@tm.ee> writes:

I'm in a situation where I urgently need to debug PostgreSQL 7.0.2
for deadlocks that it does not notice/timeout

The most likely bet is that you are seeing deadlocks that involve a
buffer spinlock (LockBuffer() in bufmgr.c) --- there's no timeout or
deadlock detection check in that code. I have been suspicious for
some time that there are deadlocks possible there, but haven't had
any luck getting a reproducible example to study. (If you can present
a reproducible way to make the problem happen, please post it!)

Where can I find info about running several concurrent backends
under a debugger ?

Just fire up N backends and attach to each one with N instances of gdb.
It's a little confusing but I've done it ...

regards, tom lane

#3Fabrice Scemama
fabrices@ximmo.ftd.fr
In reply to: Hannu Krosing (#1)
Re: I want tips for debugging deadlocks

By the way, we finally understood that our main problem,
the one that was making our Pg hang forever, comes from
a deadlock problem. Same as Hannu's one.

There are no deadlock detection, indeed. Good DBAs, or
DBAs working with good coders, will never come across
the problem :) but we did :(

I think a trace log being sent to the DBA would be
a great thing when a deadlock is detected, with if
possible the query that cannot be executed. Oracle
does a good job, there.

Fabrice

Tom Lane wrote:

Show quoted text

Hannu Krosing <hannu@tm.ee> writes:

I'm in a situation where I urgently need to debug PostgreSQL 7.0.2
for deadlocks that it does not notice/timeout

The most likely bet is that you are seeing deadlocks that involve a
buffer spinlock (LockBuffer() in bufmgr.c) --- there's no timeout or
deadlock detection check in that code. I have been suspicious for
some time that there are deadlocks possible there, but haven't had
any luck getting a reproducible example to study. (If you can present
a reproducible way to make the problem happen, please post it!)

Where can I find info about running several concurrent backends
under a debugger ?

Just fire up N backends and attach to each one with N instances of gdb.
It's a little confusing but I've done it ...

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabrice Scemama (#3)
Re: I want tips for debugging deadlocks

Fabrice Scemama <fabrices@ximmo.ftd.fr> writes:

By the way, we finally understood that our main problem,
the one that was making our Pg hang forever, comes from
a deadlock problem. Same as Hannu's one.

There are no deadlock detection, indeed. Good DBAs, or
DBAs working with good coders, will never come across
the problem :) but we did :(

The reason LockBuffer() has no deadlock detection is that it's not
supposed to be possible for a deadlock to occur there, user mistake
or no. So I consider this a bug --- either there is an actual logic
error somewhere, or else the supposition is wrong and we need to add
deadlock handling.

If you can describe what you were doing so that the behavior can be
reproduced, it'd be very helpful.

regards, tom lane