dropping a partition may cause deadlock

Started by Amit Langoteabout 9 years ago4 messageshackers
Jump to latest
#1Amit Langote
Langote_Amit_f8@lab.ntt.co.jp

Hi,

I noticed that a deadlock can occur due to the way locking when dropping a
partition proceeds. Steps to reproduce:

1. Attach debugger to two sessions, one of which will do a select on the
partitioned parent and the other will drop one of its partitions.

2. In the first debugging session, set a breakpoint at the start of
expand_inherited_rtentry() which is the first point in a select query's
processing where individual partitions will be locked (the parent will
have already been locked by the rewriter).

3. In the second session, set a breakpoint at the start of
heap_drop_with_catalog(), which is the first point in the drop command's
processing where the parent will be locked (the partition will have
already been locked by RangeVarGetRelidExtended()). This will wait for
the first session to release the lock on the parent.

4. In the first session, proceeding with locking of the partition will
cause it wait for the second session that is holding a lock on it; a
deadlock is detected, because that session is waiting for us to release
the lock on the parent.

Attached is a patch to fix that. In the original partitioning patch, I
had aped the approach of index_drop() where the parent heap relation is
locked along with the index relation so that the parent's cached list of
indexes can be invalidated. But I failed to also ape what
RangeVarCallbackForDropRelation() does when dropping an index, which is to
lock the parent heap relation before locking the index relation at all.
For dropping a partition case, it means we lock the parent before we lock
the partition relation.

Will add this to open items list.

Thanks,
Amit

Attachments:

0001-Fix-possibility-of-deadlock-when-dropping-partitions.patchtext/x-diff; name=0001-Fix-possibility-of-deadlock-when-dropping-partitions.patchDownload+47-14
#2Noah Misch
noah@leadboat.com
In reply to: Amit Langote (#1)
Re: dropping a partition may cause deadlock

On Mon, Apr 03, 2017 at 03:48:05PM +0900, Amit Langote wrote:

I noticed that a deadlock can occur due to the way locking when dropping a
partition proceeds. Steps to reproduce:

1. Attach debugger to two sessions, one of which will do a select on the
partitioned parent and the other will drop one of its partitions.

2. In the first debugging session, set a breakpoint at the start of
expand_inherited_rtentry() which is the first point in a select query's
processing where individual partitions will be locked (the parent will
have already been locked by the rewriter).

3. In the second session, set a breakpoint at the start of
heap_drop_with_catalog(), which is the first point in the drop command's
processing where the parent will be locked (the partition will have
already been locked by RangeVarGetRelidExtended()). This will wait for
the first session to release the lock on the parent.

4. In the first session, proceeding with locking of the partition will
cause it wait for the second session that is holding a lock on it; a
deadlock is detected, because that session is waiting for us to release
the lock on the parent.

Attached is a patch to fix that. In the original partitioning patch, I
had aped the approach of index_drop() where the parent heap relation is
locked along with the index relation so that the parent's cached list of
indexes can be invalidated. But I failed to also ape what
RangeVarCallbackForDropRelation() does when dropping an index, which is to
lock the parent heap relation before locking the index relation at all.
For dropping a partition case, it means we lock the parent before we lock
the partition relation.

Will add this to open items list.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#2)
Re: dropping a partition may cause deadlock

On Sun, Apr 9, 2017 at 7:57 PM, Noah Misch <noah@leadboat.com> wrote:

The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

I have committed the patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#3)
Re: dropping a partition may cause deadlock

On 2017/04/11 22:18, Robert Haas wrote:

On Sun, Apr 9, 2017 at 7:57 PM, Noah Misch <noah@leadboat.com> wrote:

The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

I have committed the patch.

Thanks.

Regards,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers