deadlock debug methodology

Started by antiochus antiochusalmost 18 years ago1 messagesgeneral
Jump to latest
#1antiochus antiochus
antiochus.usa@gmail.com

Hi All,

I have a deadlock situation, two transactions waiting on each other to
complete. Based on the details below, does anyone have recommendations for
me. Thanks.

Regards,

A.

I am using:
rpm -qa|grep postgres
compat-postgresql-libs-4-2PGDG.rhel5_x86_64
postgresql-server-8.2.6-1PGDG.rhel5
postgresql-8.2.6-1PGDG.rhel5
postgresql-devel-8.2.6-1PGDG.rhel5
postgresql-libs-8.2.6-1PGDG.rhel5

I set 'deadlock_timeout = 1h' in order to have time to inspect pg_locks.

The locks are:

db0=# select * from pg_locks where not granted;
locktype | database | relation | page | tuple | transactionid |
classid | objid | objsubid | transaction | pid | mode | granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------+---------
transactionid | | | | | 1407
| | | | 1404 | 8303 | ShareLock | f
transactionid | | | | | 1404
| | | | 1407 | 8277 | ShareLock | f
(2 rows)

Each transaction seems to be waiting on a row-level lock the other has
acquired. The tuples are:

db0=# select * from pg_locks where locktype='tuple';
locktype | database | relation | page | tuple | transactionid | classid |
objid | objsubid | transaction | pid | mode | granted
----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+---------------+---------
tuple | 16384 | 16576 | 38 | 6 | |
| | | 1407 | 8277 | ExclusiveLock | t
tuple | 16384 | 16576 | 38 | 5 | |
| | | 1404 | 8303 | ShareLock | t
(2 rows)

The corresponding rows are:

db0=# select id from tt where ctid = '(38,6)';
id
-----
600
(1 row)
db0=# select id from tt where ctid = '(38,5)';
id
-----
611
(1 row)

Note that the id column is defined as 'id serial primary key'.

The two queries in effect in each transaction are found using:

select current_query from pg_stat_activity where procpid = 8303;
select current_query from pg_stat_activity where procpid = 8277;

Careful inspection of these (unfortunately complex) queries seems to
indicate row-level locks are acquired in consistent order, assuming that any
command of the type

update tt where ....

will always lock rows in a consistent order (can someone confirm that it is
necessarily the case).

Therefore, it is not clear to me how this deadlock situation arises.

Does anyone have a recommendation?