Deadlock with pg_dump?

Started by Chris Campbellover 19 years ago44 messageshackers
Jump to latest
#1Chris Campbell
chris@bignerdranch.com

We're getting deadlock error messages in the production database logs
during times of inactivity, where the only other thing using the
database (we think) is the every-15-minutes pg_dump process. There
are still database connections up-and-running from unused Hibernate
Java processes, but they Shouldn't Be doing anything (and shouldn't
be holding locks, etc).

The deadlock error message looks like this:

ERROR: deadlock detected
DETAIL: Process 1120 waits for ShareLock on transaction 5847116;
blocked by process 1171.
Process 1171 waits for ExclusiveLock on tuple (6549,28) of relation
37637 of database 37574; blocked by process 1120.

Relation 37636 is the users table (schema attached).

Process 1120 was running an UPDATE query and changing a single row in
the users table. The users table does have foreign keys to 4 other
tables. Is it possible that those foreign key constraints acquire
locks in a different order than pg_dump (as it's SELECTing from the
tables), and it's hitting at *just* the right time to cause a deadlock?

I've tried to reproduce it on a test machine by running pgbench
(after adding foreign keys to the pgbench tables) and pg_dump in
tight loops in two concurrent shell scripts, but no deadlock.

Any ideas on how to track this down?

Under what conditions does a process acquire a ShareLock on another
transaction?

Thanks!

- Chris

Attachments:

schema.txttext/plain; name=schema.txt; x-unix-mode=0777Download
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Campbell (#1)
Re: Deadlock with pg_dump?

Chris Campbell <chris@bignerdranch.com> writes:

ERROR: deadlock detected
DETAIL: Process 1120 waits for ShareLock on transaction 5847116;
blocked by process 1171.
Process 1171 waits for ExclusiveLock on tuple (6549,28) of relation
37637 of database 37574; blocked by process 1120.

Relation 37636 is the users table (schema attached).

Process 1120 was running an UPDATE query and changing a single row in
the users table.

And what was 1171 doing? I really doubt that either of these could have
been pg_dump.

Given that you appear to be running 8.1 (tut-tut for not saying), it
really shouldn't be a foreign key problem either. I'm betting these
are just flat out conflicting updates of the same row(s).

regards, tom lane

#3Chris Campbell
chris@bignerdranch.com
In reply to: Tom Lane (#2)
Re: Deadlock with pg_dump?

On Oct 26, 2006, at 17:21, Tom Lane wrote:

And what was 1171 doing? I really doubt that either of these could
have
been pg_dump.

I know that process 1120 is a Java client (Hibernate) running an
UPDATE query, but I have no idea what 1171 is. I doubt that 1171 was
pg_dump, but when we turn off the pg_dump cron jobs (for 12-ish
hours), the deadlocks go away. We usually see 5 or 6 deadlocks spread
throughout the day. That's not definitive evidence, of course, but
it's certainly curious.

Given that you appear to be running 8.1 (tut-tut for not saying), it
really shouldn't be a foreign key problem either. I'm betting these
are just flat out conflicting updates of the same row(s).

Yeah, 8.1.3. Sorry about the omission.

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

Thanks!

- Chris

#4Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Chris Campbell (#3)
Re: Deadlock with pg_dump?

On Thu, Oct 26, 2006 at 06:11:59PM -0400, Chris Campbell wrote:

On Oct 26, 2006, at 17:21, Tom Lane wrote:

And what was 1171 doing? I really doubt that either of these could
have
been pg_dump.

I know that process 1120 is a Java client (Hibernate) running an
UPDATE query, but I have no idea what 1171 is. I doubt that 1171 was
pg_dump, but when we turn off the pg_dump cron jobs (for 12-ish
hours), the deadlocks go away. We usually see 5 or 6 deadlocks spread
throughout the day. That's not definitive evidence, of course, but
it's certainly curious.

I seem to remember something funny about hibernate and locking, though I
can't recall any details right now... but searching the archives might
provide insight.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Campbell (#3)
Re: Deadlock with pg_dump?

Chris Campbell <chris@bignerdranch.com> writes:

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.
Still, if you're seeing multiple occurrences per day, that would
probably let you build up a good picture of all the involved operations
over a couple of days.

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

Unless your applications are issuing actual LOCK TABLE commands, it's
really hard to see how pg_dump could be involved. It doesn't take
anything stronger than AccessShareLock, and shouldn't be holding any
tuple-level locks at all.

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: Deadlock with pg_dump?

Tom Lane wrote:

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

I think it default to panic because it's the way to disable the feature,
which was the easiest sell when the feature was introduced. I'm all for
lowering it to error.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#7Chris Campbell
chris@bignerdranch.com
In reply to: Tom Lane (#5)
Re: Deadlock with pg_dump?

On Oct 26, 2006, at 18:45, Tom Lane wrote:

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.

Would it be possible (in 8.3, say) to log the conflicting backend's
current statement (from pg_stat_activity, perhaps)? I guess the
conflicting backend would currently be waiting for a lock, so its
current query (before releasing the lock) is the one we want.

Thanks!

- Chris

#8Albe Laurenz
all@adv.magwien.gv.at
In reply to: Chris Campbell (#7)
Re: Deadlock with pg_dump?

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

To avoid spamming the logs with every failed SQL statement?

Yours,
Laurenz Albe

#9Csaba Nagy
nagy@ecircle-ag.com
In reply to: Albe Laurenz (#8)
Re: Deadlock with pg_dump?

On Fri, 2006-10-27 at 09:23, Albe Laurenz wrote:

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

To avoid spamming the logs with every failed SQL statement?

And it would be hurting applications where query failure is taken as a
valid path (as inserting first and update if failing)...

Cheers,
Csaba.

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Csaba Nagy (#9)
Re: Deadlock with pg_dump?

Csaba Nagy wrote:

On Fri, 2006-10-27 at 09:23, Albe Laurenz wrote:

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

To avoid spamming the logs with every failed SQL statement?

And it would be hurting applications where query failure is taken as a
valid path (as inserting first and update if failing)...

Both of these are arguments in favor of being able to alter the level,
which would still exist. They are not good arguments for not having error
as the default level.

cheers

andrew

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Albe Laurenz (#8)
Re: Deadlock with pg_dump?

"Albe Laurenz" <all@adv.magwien.gv.at> writes:

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

To avoid spamming the logs with every failed SQL statement?

Certainly there are people who will turn it off, but that's why it's
configurable. I've had to answer "how do I find out what's causing
error message FOO" often enough that I'm starting to think logging error
statements is a more useful default than not logging 'em ...

regards, tom lane

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#5)
Re: Deadlock with pg_dump?

On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:

Chris Campbell <chris@bignerdranch.com> writes:

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.

Yeh, we need a much better locking logger for performance analysis.

We really need to dump the whole wait-for graph for deadlocks, since
this might be more complex than just two statements involved. Deadlocks
ought to be so infrequent that we can afford the log space to do this -
plus if we did this it would likely lead to fewer deadlocks.

For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
that would allow you to dump the wait-for graph for any data-level locks
that wait too long, rather than just those that deadlock. Many
applications experience heavy locking because of lack of holistic
design. That will also show up the need for other utilities to act
CONCURRENTLY, if possible.

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

For production systems where we expect fewer ERRORs, each one is
important, so this would be a good default.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Campbell (#3)
Re: Deadlock with pg_dump?

Chris Campbell <chris@bignerdranch.com> writes:

On Oct 26, 2006, at 17:21, Tom Lane wrote:

And what was 1171 doing? I really doubt that either of these could
have been pg_dump.

I know that process 1120 is a Java client (Hibernate) running an
UPDATE query, but I have no idea what 1171 is. I doubt that 1171 was
pg_dump, but when we turn off the pg_dump cron jobs (for 12-ish
hours), the deadlocks go away. We usually see 5 or 6 deadlocks spread
throughout the day. That's not definitive evidence, of course, but
it's certainly curious.

FWIW, we recently found a deadlock risk that's likely to be triggered in
situations involving multiple updates and foreign keys. I still don't
see any connection to pg_dump, but otherwise it seems like it might be
related to your problem. Try this patch:
http://archives.postgresql.org/pgsql-committers/2006-11/msg00121.php

regards, tom lane

#14Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#12)
Re: Deadlock with pg_dump?

Simon Riggs wrote:

On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:

Chris Campbell <chris@bignerdranch.com> writes:

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.

Yeh, we need a much better locking logger for performance analysis.

We really need to dump the whole wait-for graph for deadlocks, since
this might be more complex than just two statements involved. Deadlocks
ought to be so infrequent that we can afford the log space to do this -
plus if we did this it would likely lead to fewer deadlocks.

For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
that would allow you to dump the wait-for graph for any data-level locks
that wait too long, rather than just those that deadlock. Many
applications experience heavy locking because of lack of holistic
design. That will also show up the need for other utilities to act
CONCURRENTLY, if possible.

Old email, but I don't see how our current output is not good enough?

test=> lock a;
ERROR: deadlock detected
DETAIL: Process 6855 waits for AccessExclusiveLock on relation 16394 of
database 16384; blocked by process 6795.
Process 6795 waits for AccessExclusiveLock on relation 16396 of database
16384; blocked by process 6855.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: Deadlock with pg_dump?

Tom Lane wrote:

"Albe Laurenz" <all@adv.magwien.gv.at> writes:

[ Memo to hackers: why is it that log_min_error_statement = error
isn't the default? ]

To avoid spamming the logs with every failed SQL statement?

Certainly there are people who will turn it off, but that's why it's
configurable. I've had to answer "how do I find out what's causing
error message FOO" often enough that I'm starting to think logging error
statements is a more useful default than not logging 'em ...

Are we ready to set 'log_min_error_statement = error' by default for
8.3?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#15)
Re: Deadlock with pg_dump?

Bruce Momjian <bruce@momjian.us> writes:

Are we ready to set 'log_min_error_statement = error' by default for
8.3?

We already did that in 8.2.

regards, tom lane

#17Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#16)
Re: Deadlock with pg_dump?

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Are we ready to set 'log_min_error_statement = error' by default for
8.3?

We already did that in 8.2.

Oh, interesting. Oops again.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#14)
Re: Deadlock with pg_dump?

On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:

Simon Riggs wrote:

On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:

Chris Campbell <chris@bignerdranch.com> writes:

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.

Yeh, we need a much better locking logger for performance analysis.

We really need to dump the whole wait-for graph for deadlocks, since
this might be more complex than just two statements involved. Deadlocks
ought to be so infrequent that we can afford the log space to do this -
plus if we did this it would likely lead to fewer deadlocks.

For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
that would allow you to dump the wait-for graph for any data-level locks
that wait too long, rather than just those that deadlock. Many
applications experience heavy locking because of lack of holistic
design. That will also show up the need for other utilities to act
CONCURRENTLY, if possible.

Old email, but I don't see how our current output is not good enough?

test=> lock a;
ERROR: deadlock detected
DETAIL: Process 6855 waits for AccessExclusiveLock on relation 16394 of
database 16384; blocked by process 6795.
Process 6795 waits for AccessExclusiveLock on relation 16396 of database
16384; blocked by process 6855.

This detects deadlocks, but it doesn't detect lock waits.

When I wrote that it was previous experience driving me. Recent client
experience has highlighted the clear need for this. We had a lock wait
of 50 hours because of an RI check; thats the kind of thing I'd like to
show up in the logs somewhere.

Lock wait detection can be used to show up synchronisation points that
have been inadvertently designed into an application, so its a useful
tool in investigating performance issues.

I have a patch implementing the logging as agreed with Tom, will post to
patches later tonight.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#18)
Re: [HACKERS] Deadlock with pg_dump?

On Mon, 2007-02-19 at 19:38 +0000, Simon Riggs wrote:

On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:

Simon Riggs wrote:

On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:

Chris Campbell <chris@bignerdranch.com> writes:

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.

Yeh, we need a much better locking logger for performance analysis.

We really need to dump the whole wait-for graph for deadlocks, since
this might be more complex than just two statements involved. Deadlocks
ought to be so infrequent that we can afford the log space to do this -
plus if we did this it would likely lead to fewer deadlocks.

For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
that would allow you to dump the wait-for graph for any data-level locks
that wait too long, rather than just those that deadlock. Many
applications experience heavy locking because of lack of holistic
design. That will also show up the need for other utilities to act
CONCURRENTLY, if possible.

Old email, but I don't see how our current output is not good enough?

test=> lock a;
ERROR: deadlock detected
DETAIL: Process 6855 waits for AccessExclusiveLock on relation 16394 of
database 16384; blocked by process 6795.
Process 6795 waits for AccessExclusiveLock on relation 16396 of database
16384; blocked by process 6855.

This detects deadlocks, but it doesn't detect lock waits.

When I wrote that it was previous experience driving me. Recent client
experience has highlighted the clear need for this. We had a lock wait
of 50 hours because of an RI check; thats the kind of thing I'd like to
show up in the logs somewhere.

Lock wait detection can be used to show up synchronisation points that
have been inadvertently designed into an application, so its a useful
tool in investigating performance issues.

I have a patch implementing the logging as agreed with Tom, will post to
patches later tonight.

Patch for discussion, includes doc entries at top of patch, so its
fairly clear how it works.

Output is an INFO message, to allow this to trigger
log_min_error_statement when it generates a message, to allow us to see
the SQL statement that is waiting. This allows it to generate a message
prior to the statement completing, which is important because it may not
ever complete, in some cases, so simply logging a list of pids won't
always tell you what the SQL was that was waiting.

Other approaches are possible...

Comments?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachments:

log_lock_waits.v1.patchtext/x-patch; charset=UTF-8; name=log_lock_waits.v1.patchDownload+168-96
#20Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#19)
Re: [HACKERS] Deadlock with pg_dump?

I am a little concerned about a log_* setting that is INFO. I understand
why you used INFO (for log_min_error_messages), but INFO is inconsistent
with the log* prefix, and by default INFO doesn't appear in the log
file.

So, by default, the INFO is going to go to the user terminal, and not to
the logfile.

Ideas?

---------------------------------------------------------------------------

Simon Riggs wrote:

On Mon, 2007-02-19 at 19:38 +0000, Simon Riggs wrote:

On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:

Simon Riggs wrote:

On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:

Chris Campbell <chris@bignerdranch.com> writes:

Is there additional logging information I can turn on to get more
details? I guess I need to see exactly what locks both processes
hold, and what queries they were running when the deadlock occurred?
Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.

Yeh, we need a much better locking logger for performance analysis.

We really need to dump the whole wait-for graph for deadlocks, since
this might be more complex than just two statements involved. Deadlocks
ought to be so infrequent that we can afford the log space to do this -
plus if we did this it would likely lead to fewer deadlocks.

For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
that would allow you to dump the wait-for graph for any data-level locks
that wait too long, rather than just those that deadlock. Many
applications experience heavy locking because of lack of holistic
design. That will also show up the need for other utilities to act
CONCURRENTLY, if possible.

Old email, but I don't see how our current output is not good enough?

test=> lock a;
ERROR: deadlock detected
DETAIL: Process 6855 waits for AccessExclusiveLock on relation 16394 of
database 16384; blocked by process 6795.
Process 6795 waits for AccessExclusiveLock on relation 16396 of database
16384; blocked by process 6855.

This detects deadlocks, but it doesn't detect lock waits.

When I wrote that it was previous experience driving me. Recent client
experience has highlighted the clear need for this. We had a lock wait
of 50 hours because of an RI check; thats the kind of thing I'd like to
show up in the logs somewhere.

Lock wait detection can be used to show up synchronisation points that
have been inadvertently designed into an application, so its a useful
tool in investigating performance issues.

I have a patch implementing the logging as agreed with Tom, will post to
patches later tonight.

Patch for discussion, includes doc entries at top of patch, so its
fairly clear how it works.

Output is an INFO message, to allow this to trigger
log_min_error_statement when it generates a message, to allow us to see
the SQL statement that is waiting. This allows it to generate a message
prior to the statement completing, which is important because it may not
ever complete, in some cases, so simply logging a list of pids won't
always tell you what the SQL was that was waiting.

Other approaches are possible...

Comments?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#21)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#25)
#27Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#26)
#28Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#21)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#36)
#38Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#37)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#39)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#40)
#42Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#41)
#43Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#19)
#44Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#19)