Race between SELECT and ALTER TABLE NO INHERIT
Hello.
I had a case of unexpected error caused by ALTER TABLE NO
INHERIT.
=# CREATE TABLE p (a int);
=# CREATE TABLE c1 () INHERITS (p);
session A=# BEGIN;
session A=# ALTER TABLE c1 NO INHERIT p;
session B=# EXPLAIN ANALYZE SELECT * FROM p;
(blocked)
session A=# COMMIT;
session B: ERROR: could not find inherited attribute "a" of relation "c1"
This happens at least back to 9.1 to master and doesn't seem to
be a designed behavior.
The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.
I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.
The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.
Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.
Any comments or thoughts?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hello,
On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
Hello.
I had a case of unexpected error caused by ALTER TABLE NO
INHERIT.=# CREATE TABLE p (a int);
=# CREATE TABLE c1 () INHERITS (p);session A=# BEGIN;
session A=# ALTER TABLE c1 NO INHERIT p;session B=# EXPLAIN ANALYZE SELECT * FROM p;
(blocked)session A=# COMMIT;
session B: ERROR: could not find inherited attribute "a" of relation "c1"
This happens at least back to 9.1 to master and doesn't seem to
be a designed behavior.
I recall I had proposed a fix for the same thing some time ago [1]/messages/by-id/565EB1F2.7000201@lab.ntt.co.jp.
The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.
Right.
I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.
I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.
The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.
That makes sense.
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
Thanks,
Amit
[1]: /messages/by-id/565EB1F2.7000201@lab.ntt.co.jp
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <7f5fe522-f328-3c40-565f-5e1ce37455d1@lab.ntt.co.jp>
Hello,
On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
Hello.
I had a case of unexpected error caused by ALTER TABLE NO
INHERIT.=# CREATE TABLE p (a int);
=# CREATE TABLE c1 () INHERITS (p);session A=# BEGIN;
session A=# ALTER TABLE c1 NO INHERIT p;session B=# EXPLAIN ANALYZE SELECT * FROM p;
(blocked)session A=# COMMIT;
session B: ERROR: could not find inherited attribute "a" of relation "c1"
This happens at least back to 9.1 to master and doesn't seem to
be a designed behavior.I recall I had proposed a fix for the same thing some time ago [1].
Wow. About 1.5 years ago and stuck? I have a concrete case where
this harms so I'd like to fix that this time. How can we move on?
The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.Right.
I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.
The hash table is not crucial in the patch. Substantially it just
recurs down looking up pg_inherits for the child. I considered
the new index but abandoned because I thought that such case
won't occur so frequently.
The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.That makes sense.
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
I recall I had proposed a fix for the same thing some time ago [1].
Wow. About 1.5 years ago and stuck? I have a concrete case where
this harms so I'd like to fix that this time. How can we move on?
Agreed that this should be fixed.
Your proposed approach #1 to recheck the inheritance after obtaining the
lock on the child table seems to be a good way forward.
Approach #2 of reordering locking is a simpler solution, but does not
completely prevent the problem, because DROP TABLE child can still cause
it to occur, as you mentioned.
The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.Right.
I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.The hash table is not crucial in the patch. Substantially it just
recurs down looking up pg_inherits for the child. I considered
the new index but abandoned because I thought that such case
won't occur so frequently.
Agreed. BTW, the hash table in patch #1 does not seem to be really
helpful. In the while loop in is_descendant_of_internal(), does
hash_search() ever return found = true? AFAICS, it does not.
The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.That makes sense.
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.
Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:
Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one. That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code. Next commands must recreate the entry so that the
correct thing happens based on the updated information. More precisely,
we must invalidate the current entry. RelationClearRelation() will either
delete the entry or rebuild it. If it's being referenced somewhere, it
will be rebuilt. The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog. We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent). We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock. Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
I'll add this to CF2017-09.
At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <75fe42df-b1d8-89ff-596d-d9da0749e66d@lab.ntt.co.jp>
On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
I recall I had proposed a fix for the same thing some time ago [1].
Wow. About 1.5 years ago and stuck? I have a concrete case where
this harms so I'd like to fix that this time. How can we move on?Agreed that this should be fixed.
Your proposed approach #1 to recheck the inheritance after obtaining the
lock on the child table seems to be a good way forward.Approach #2 of reordering locking is a simpler solution, but does not
completely prevent the problem, because DROP TABLE child can still cause
it to occur, as you mentioned.The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.Right.
I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.The hash table is not crucial in the patch. Substantially it just
recurs down looking up pg_inherits for the child. I considered
the new index but abandoned because I thought that such case
won't occur so frequently.Agreed. BTW, the hash table in patch #1 does not seem to be really
helpful. In the while loop in is_descendant_of_internal(), does
hash_search() ever return found = true? AFAICS, it does not.The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.That makes sense.
So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a
comment.
This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)
<< the following is another topic >>
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one. That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code. Next commands must recreate the entry so that the
correct thing happens based on the updated information. More precisely,
we must invalidate the current entry. RelationClearRelation() will either
delete the entry or rebuild it. If it's being referenced somewhere, it
will be rebuilt. The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog. We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent). We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock. Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.
I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
ignore_no_longer_child.patchtext/x-patch; charset=us-asciiDownload+114-18
At Mon, 28 Aug 2017 18:28:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170828.182807.98097766.horiguchi.kyotaro@lab.ntt.co.jp>
I'll add this to CF2017-09.
This patch got deadly atack from the commit 30833ba. I changed
the signature of expand_single_inheritance_child in addition to
make_inh_translation_list to notify that the specified child is
no longer a child of the parent.
This passes regular regression test and fixed the the problem.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
ignore_no_longer_child_v2_69835bc.patchtext/x-patch; charset=us-asciiDownload+144-36
On 26 June 2017 at 10:16, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
Is this requirement documented or in comments anywhere?
I can't see anything about that, which is a fairly major usage point.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/13 12:05, Simon Riggs wrote:
On 26 June 2017 at 10:16, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).Is this requirement documented or in comments anywhere?
Yes. See the last sentence in the description of PARTITION OF clause in
CREATE TABLE:
https://www.postgresql.org/docs/devel/static/sql-createtable.html#sql-createtable-partition
And, the 4th point in the list of differences between declarative
partitioning and inheritance:
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.Any comments or thoughts?
I agree that the second has less user impact, but I wonder if we can
think that will really fix the bug completely, or more generally if it
will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
locking the parent. I have a sneaking suspicion that may be wishful
thinking.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Wed, 13 Sep 2017 20:20:48 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZnKgN1dCQaBwHn1aDVPyAfQ8PGRmyUH22hxy39+Aqawg@mail.gmail.com>
On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.I see two ways to fix this.
The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.Any comments or thoughts?
I agree that the second has less user impact, but I wonder if we can
think that will really fix the bug completely, or more generally if it
will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
locking the parent. I have a sneaking suspicion that may be wishful
thinking.
Thanks for the comment.
The recheck patch prevent planner from involving just-detached
children while inheritance expansion. No simultaneous detatching
of children doesn't affect the planning before the time.
Once planner (the select side) gets lock on the child, the alter
side cannot do anything until the select finishes. If the alter
side won, the select side detects detaching immediately after the
lock is released then excludes the children. No problem will
occur ever after. Even in the case a child is replaced with
another table, it is nothing different from simple detaching.
As the result, I think that the recheck patch saves all possible
problem caused by simultaneously detached children.
However, the parent-locking patch is far smaller and it doesn't
need such an explanation on its perfectness. If another problem
occurs by simlultaneous detaching, it must haven't taken
necessary locks in the right order.
The most significant reason for my decision on this ptach was the
fact that the DROP child case have been resolved by rechecking
without parent locks, which is a kind of waste of resources if it
could be resolved without it. (And I saw that the same solution
is taken at least several places)
I don't think there's noticeable difference of behavior
(excluding performance) for users between the two solutions.
Is there anyone who has an opinion on the point?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi.
On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
<< the following is another topic >>
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one. That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code. Next commands must recreate the entry so that the
correct thing happens based on the updated information. More precisely,
we must invalidate the current entry. RelationClearRelation() will either
delete the entry or rebuild it. If it's being referenced somewhere, it
will be rebuilt. The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog. We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent). We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock. Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?
Sorry, I think I forgot to reply to this. Since you seem to have chosen
the other solution (checking that child is still a child), maybe this
reply is a bit too late, but anyway.
DropInherit or NO INHERIT is seen primarily as changing a child table's
(which is the target table of the command) property that it is no longer a
child of the parent, so we lock the child table to block concurrent
operations from considering it a child of parent anymore. The fact that
parent is locked after the child and with ShareUpdateExclusiveLock instead
of AccessExclusiveLock, we observe this race condition when SELECTing from
the parent.
DropPartition or DETACH PARTITION is seen primarily as changing the parent
table's (which is the target table of the command) property that one of
the partitions is removed, so we lock the parent. Any concurrent
operations that rely on the parent's relcache to get the partition list
will wait for the session that is dropping the partition to finish, so
that they get the fresh information from the relcache (or more importantly
do not end up with information obtained from the relcache going invalid
under them without notice). Note that the lock on the partition/child is
also present and it plays more or less the the same role as it does in the
DropInherit case, but due to different order of locking, reported race
condition does not occur between SELECT on partitioned table and
DROP/DETACH PARTITION.
By the way, I will take a look at your patch when I come back from the
vacation. Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092.
Thanks,
Amit
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/15 15:36, Amit Langote wrote:
The fact that
parent is locked after the child and with ShareUpdateExclusiveLock instead
of AccessExclusiveLock, we observe this race condition when SELECTing from
the parent.
Oops, I meant "parent table is locked with AccessShareLock instead of
AccessExclusiveLock..."
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <d88a9a64-e307-59b5-b4c3-8d7fc11cb59d@lab.ntt.co.jp>
Hi.
On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
<< the following is another topic >>
BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock. There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one. That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code. Next commands must recreate the entry so that the
correct thing happens based on the updated information. More precisely,
we must invalidate the current entry. RelationClearRelation() will either
delete the entry or rebuild it. If it's being referenced somewhere, it
will be rebuilt. The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog. We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent). We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock. Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?Sorry, I think I forgot to reply to this. Since you seem to have chosen
the other solution (checking that child is still a child), maybe this
reply is a bit too late, but anyway.
I choosed it at that time for the reason mentioned upthread, but
haven't decided which is better.
DropInherit or NO INHERIT is seen primarily as changing a child table's
(which is the target table of the command) property that it is no longer a
child of the parent, so we lock the child table to block concurrent
operations from considering it a child of parent anymore. The fact that
parent is locked after the child and with ShareUpdateExclusiveLock instead
of AccessExclusiveLock, we observe this race condition when SELECTing from
the parent.DropPartition or DETACH PARTITION is seen primarily as changing the parent
table's (which is the target table of the command) property that one of
the partitions is removed, so we lock the parent. Any concurrent
operations that rely on the parent's relcache to get the partition list
will wait for the session that is dropping the partition to finish, so
that they get the fresh information from the relcache (or more importantly
do not end up with information obtained from the relcache going invalid
under them without notice). Note that the lock on the partition/child is
also present and it plays more or less the the same role as it does in the
DropInherit case, but due to different order of locking, reported race
condition does not occur between SELECT on partitioned table and
DROP/DETACH PARTITION.
Thank you for the explanation. I understand that the difference
comes from which of parent and children has the information about
inheritance/partitioning. DROP child and ALTER child NO INHERITS
are (I think) the only two operations that intiated from children
side. The parent-locking patch results in different solutions for
similar problems.
However, parent locking on DROPped child requires a new index
InheritsRelidIndex to avoid full scan on pg_inherits, but the NO
INHERITS case doesn't. This might be a good reason for the
difference.
I noticed that DROP TABLE partition acquires lock on the parent
in a callback for RangeVarGetRelid(Extended). The same way seems
reasonable for the NO INHERIT case. As the result the patch
becomes far small and less invasive than the previous
one. (dropinh_lock_parent_v2.patch)
As a negative PoC, attacched dropchild_lock_parent_PoC.patch only
to show how the same method on DROP TABLE case looks.
By the way, I will take a look at your patch when I come back from the
vacation. Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1].
Great. Thanks.
Thanks,
Amit[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
By the way, I will take a look at your patch when I come back from the
vacation. Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1].
Moved to CF 2018-01.
--
Michael
Hello,
At Wed, 29 Nov 2017 14:04:01 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTt-mj4S8qoO_C9CaFAcV0J3vhgg4_Kw2U1wXvyEYB0bw@mail.gmail.com>
On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:By the way, I will take a look at your patch when I come back from the
vacation. Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1].Moved to CF 2018-01.
Thank you. (I'll do that by myself from the next CF)
This is rebased patch and additional patch of isolation test for
this problem.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
[ 0002-Lock-parent-on-ALTER-TABLE-NO-INHERIT.patch ]
I don't especially like any of the patches proposed on this thread.
The one with rechecking inheritance seems expensive, duplicative,
and complicated. The approach of taking a lock on the parent will
create deadlocks in usage patterns that did not encounter such
deadlocks before --- in particular, in the scenarios in which this
behavior matters at all, the ALTER will deadlock against ordinary
queries that lock the parent before the child. So I can't believe
that anyone who's hitting the problem in the field will think the
extra lock is an improvement.
I think that there might be a much simpler solution to this, which
is to just remove make_inh_translation_list's tests of attinhcount,
as per attached. Those are really pretty redundant: since we are
matching by column name, the unique index on pg_attribute already
guarantees there is at most one matching column. I have a feeling
that those tests are my fault and I put them in on the theory that
they could save a few strcmp executions --- but if they're causing
problems, we can certainly drop them. In any case they'd only save
something meaningful if most of the child's columns are non-inherited,
which doesn't seem like the way to bet.
In this way, if the child gets past the initial check on whether it's
been dropped, we'll still succeed in matching its columns, even if
they are no longer marked as inherited. For me, this works fine with
either the ALTER NO INHERIT case (c1 does get scanned, with no error)
or the DROP TABLE case (c1 doesn't get scanned). Now, you can break
it if you really try hard: make the concurrent transaction do both
ALTER NO INHERIT and then DROP COLUMN. But that's not a scenario
that's been complained of, so I don't want to add a ton of mechanism
to fix it.
regards, tom lane
Attachments:
skip-unnecessary-attinhcount-checks.patchtext/x-diff; charset=us-ascii; name=skip-unnecessary-attinhcount-checks.patchDownload+2-2
I wrote:
I think that there might be a much simpler solution to this, which
is to just remove make_inh_translation_list's tests of attinhcount,
as per attached.
I went ahead and pushed that, with an isolation test based on the
one Horiguchi-san submitted but covering a few related cases.
regards, tom lane
Hello, sorry for my late reply.
At Wed, 10 Jan 2018 14:56:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <26017.1515614209@sss.pgh.pa.us>
I think that there might be a much simpler solution to this, which
is to just remove make_inh_translation_list's tests of attinhcount,
as per attached. Those are really pretty redundant: since we are
matching by column name, the unique index on pg_attribute already
guarantees there is at most one matching column. I have a feeling
My thought were restricted to the same behavior as the drop case,
but Tom's solution is also fine for me. I agree to the point that
the colums with the same name in a inheritance tree are safely
assumed to be in a inheritance relationship. (Assuming everything
is consistent.)
At Fri, 12 Jan 2018 15:52:08 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <15881.1515790328@sss.pgh.pa.us>
I wrote:
I think that there might be a much simpler solution to this, which
is to just remove make_inh_translation_list's tests of attinhcount,
as per attached.I went ahead and pushed that, with an isolation test based on the
one Horiguchi-san submitted but covering a few related cases.
Thank you for commiting it.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center