Support for REINDEX CONCURRENTLY
Hi all,
One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.
The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php
Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
This basically allows to perform read and write operations on a table whose
index(es) are
reindexed at the same time. Pretty useful for a production environment. The
caveats of this
feature is that it is slower than normal reindex, and impacts other
backends with the extra CPU,
memory and IO it uses to process. The implementation is based on something
on the same ideas
as pg_reorg and on an idea of Andres.
Please find attached a version that I consider as a base for the next
discussions, perhaps
a version that could be submitted to the commit fest next month. Patch is
aligned with postgres
master at commit 09ac603.
With this feature, you can rebuild a table or an index with such commands:
REINDEX INDEX ind CONCURRENTLY;
REINDEX TABLE tab CONCURRENTLY;
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- Shared tables cannot be reindexed concurrently
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations
Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as the
index that is rebuilt (called here old index). This new index has as name
$OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and
not ready.
2) Take session locks on old and new index(es), and the parent table to
prevent
unfortunate drops.
3) Commit and start a new transaction
4) Wait until no running transactions could have the table open with the
old list of indexes.
5) Build the new indexes. All the new indexes are marked as indisready.
6) Commit and start a new transaction
7) Wait until no running transactions could have the table open with the
old list of indexes.
8) Take a reference snapshot and validate the new indexes
9) Wait for the old snapshots based on the reference snapshot
10) mark the new indexes as indisvalid
11) Commit and start a new transaction. At this point the old and new
indexes are both valid
12) Take a new reference snapshot and wait for the old snapshots to insure
that old
indexes are not corrupted,
13) Mark the old indexes as invalid
14) Swap new and old indexes, consisting here in switching their names.
15) Old indexes are marked as invalid.
16) Commit and start a new transaction
17) Wait for transactions that might use the old indexes
18) Old indexes are marked as not ready
19) Commit and start a new transaction
20) Drop the old indexes
The following process might be reducible, but I would like that to be
decided depending on
the community feedback and experience on such concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple
waits (perhaps sometimes unnecessary?) and subtransactions.
If during the process an error occurs, the table will finish with either
the old or new index
as invalid. In this case the user will be in charge to drop the invalid
index himself.
The concurrent index can be easily identified with its suffix *_cct.
This patch has required some refactorisation effort as I noticed that the
code of index
for concurrent operations was not very generic. In order to do that, I
created some
new functions in index.c called index_concurrent_* which are used by CREATE
INDEX
and REINDEX in my patch. Some refactoring has also been done regarding the
wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes
in indexcmds.c). The patch structure is relying a maximum on the functions
of index.c
when creating, building and validating concurrent index.
Based on the comments of this thread, I would like to submit the patch at
the next
commit fest. Just let me know if the approach taken by the current
implementation
is OK ot if it needs some modifications. That would be really helpful.
The patch includes some regression tests for error checks and also some
documentation.
Regressions are passing, code has no whitespaces and no compilation
warnings.
I have also done tests checking for read and write operations on index scan
of parent table
at each step of the process (by using gdb to stop the reindex process at
precise places).
Thanks, and looking forward to your feedback,
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20121003_reindex_concurrent.patchapplication/octet-stream; name=20121003_reindex_concurrent.patchDownload+1002-172
On 3 October 2012 02:14, Michael Paquier <michael.paquier@gmail.com> wrote:
Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
Thanks
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
Fair enough
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations
Those restrictions are important ones to resolve since they prevent
the CONCURRENTLY word from being true in a large proportion of cases.
We need to be clear that the remainder of this can be done in user
space already, so the proposal doesn't move us forwards very far,
except in terms of packaging. IMHO this needs to be more than just
moving a useful script into core.
Here is a description of what happens when reorganizing an index
concurrently
There are four waits for every index, again similar to what is
possible in user space.
When we refactor that, I would like to break things down into N
discrete steps, if possible. Each time we hit a wait barrier, a
top-level process would be able to switch to another task to avoid
waiting. This would then allow us to proceed more quickly through the
task. I would admit that is a later optimisation, but it would be
useful to have the innards refactored to allow for that more easily
later. I'd accept Not yet, if doing that becomes a problem in short
term.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On Wednesday, October 03, 2012 03:14:17 AM Michael Paquier wrote:
One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.phpWell, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
Very cool!
This basically allows to perform read and write operations on a table whose
index(es) are reindexed at the same time. Pretty useful for a production
environment. The caveats of this feature is that it is slower than normal
reindex, and impacts other backends with the extra CPU, memory and IO it
uses to process. The implementation is based on something on the same ideas
as pg_reorg and on an idea of Andres.
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
I would like to support something like REINDEX USER TABLES; or similar at some
point, but that very well can be a second phase.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations
Why that restriction?
Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as
the index that is rebuilt (called here old index). This new index has as
name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
invalid and not ready.
You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
point already, to prevent schema changes.
8) Take a reference snapshot and validate the new indexes
Hm. Unless you factor in corrupt indices, why should this be needed?
14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well because
indexes are referenced by oid at several places. Swapping pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
Such a behaviour would at least be complicated for pg_depend and
pg_constraint.
The following process might be reducible, but I would like that to be
decided depending on the community feedback and experience on such
concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple waits (perhaps sometimes unnecessary?) and
subtransactions.
If during the process an error occurs, the table will finish with either
the old or new index as invalid. In this case the user will be in charge to
drop the invalid index himself.
The concurrent index can be easily identified with its suffix *_cct.
I am not really happy about relying on some arbitrary naming here. That still
can result in conflicts and such.
This patch has required some refactorisation effort as I noticed that the
code of index for concurrent operations was not very generic. In order to do
that, I created some new functions in index.c called index_concurrent_*
which are used by CREATE INDEX and REINDEX in my patch. Some refactoring has
also been done regarding the> wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes in indexcmds.c). The patch structure is relying a
maximum on the functions of index.c when creating, building and validating
concurrent index.
I haven't looked at the patch yet, but I was pretty sure that you would need
to do quite some refactoring to implement this and this looks like roughly the
right direction...
Thanks, and looking forward to your feedback,
I am very happy that youre taking this on!
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
This basically allows to perform read and write operations on a table
whose
index(es) are reindexed at the same time. Pretty useful for a production
environment. The caveats of this feature is that it is slower thannormal
reindex, and impacts other backends with the extra CPU, memory and IO it
uses to process. The implementation is based on something on the sameideas
as pg_reorg and on an idea of Andres.
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.I would like to support something like REINDEX USER TABLES; or similar at
some
point, but that very well can be a second phase.
This is something out of scope for the time being honestly. Later? why
not...
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- toast relations are reindexed non-concurrently when table reindex is
done
and that this table has toast relations
Why that restriction?
This is the state of the current version of the patch. And not what the
final version should do. I agree that toast relations should also be
reindexed concurrently as the others. Regarding this current restriction,
my point was just to get some feedback before digging deeper. I should have
told that though...
Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as
the index that is rebuilt (called here old index). This new index has as
name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
invalid and not ready.You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
point already, to prevent schema changes.8) Take a reference snapshot and validate the new indexes
Hm. Unless you factor in corrupt indices, why should this be needed?
14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
OK, so you mean to create an index, then switch only the relfilenode. Why
not. This is largely doable. I think that what is important here is to
choose a way of doing an keep it until the end.
Such a behaviour would at least be complicated for pg_depend and
pg_constraint.The following process might be reducible, but I would like that to be
decided depending on the community feedback and experience on such
concurrent features.
For the time being I took an approach that looks slower, but secured tomy
mind with multiple waits (perhaps sometimes unnecessary?) and
subtransactions.If during the process an error occurs, the table will finish with either
the old or new index as invalid. In this case the user will be in chargeto
drop the invalid index himself.
The concurrent index can be easily identified with its suffix *_cct.I am not really happy about relying on some arbitrary naming here. That
still
can result in conflicts and such.
The concurrent names are generated automatically with a function in
indexcmds.c, the same way a pkey indexes. Let's imagine that the
reindex concurrently command is run twice after a failure. The second
concurrent index will not have *_cct as suffix but _cct1. However I am open
to more ideas here. What I feel about the concurrent index is that it needs
a pg_class entry, even if it is just temporary, and this entry needs a name.
This patch has required some refactorisation effort as I noticed that the
code of index for concurrent operations was not very generic. In orderto do
that, I created some new functions in index.c called index_concurrent_*
which are used by CREATE INDEX and REINDEX in my patch. Some refactoringhas
also been done regarding the> wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes in indexcmds.c). The patch structure isrelying a
maximum on the functions of index.c when creating, building and
validating
concurrent index.
I haven't looked at the patch yet, but I was pretty sure that you would
need
to do quite some refactoring to implement this and this looks like roughly
the
right direction...
Thanks for spending time on it.
--
Michael Paquier
http://michael.otacoo.com
On 3 October 2012 09:10, Andres Freund <andres@2ndquadrant.com> wrote:
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
I would like to support something like REINDEX USER TABLES; or similar at some
point, but that very well can be a second phase.
Yes, that would be a nice feature anyway, even without concurrently.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.
I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.
Perhaps a good way to tackle it is to have a generic "verify two
indexes are equivalent and swap the underlying relfilenodes" operation
that can be called from both regular reindex and reindex concurrently.
As long as it's the only function that ever locks two indexes then it
can just determine what locking discipline it wants to use.
--
greg
On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old index
I don't see where the deadlock danger is hidden in that?
I didn't find anything relevant in a quick search of the archives...
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 8:08 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
Build new index.
6) process till newindex->insisready (no new locks)
validate new index
7) process till newindex->indisvalid (no new locks)
Forgot the swap old index/new index.
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exclusively which should be "invisible" now
12) drop old index
The code I sent already does that more or less btw. Just that it can be
more simplified...
I don't see where the deadlock danger is hidden in that?
I didn't find anything relevant in a quick search of the archives...
About the deadlock issues, do you mean the case where 2 sessions are
running REINDEX and/or REINDEX CONCURRENTLY on the same table or index in
parallel?
--
Michael Paquier
http://michael.otacoo.com
On Wednesday, October 03, 2012 01:15:27 PM Michael Paquier wrote:
On Wed, Oct 3, 2012 at 8:08 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
12) drop old indexThe code I sent already does that more or less btw. Just that it can be
more simplified...
The above just tried to describe the stuff thats relevant for locking, maybe I
wasn't clear enough on that ;)
I don't see where the deadlock danger is hidden in that?
I didn't find anything relevant in a quick search of the archives...About the deadlock issues, do you mean the case where 2 sessions are
running REINDEX and/or REINDEX CONCURRENTLY on the same table or index in
parallel?
No idea. The bit about deadlocks originally came from Greg, not me ;)
I guess its more the interaction with normal sessions, because the locking
used (SHARE UPDATE EXLUSIVE) prevents another CONCURRENT action running at the
same time. I don't really see the danger there though because we should never
need to acquire locks that we don't already have except the final
AccessExclusiveLock but thats after we dropped other locks and after the index
is made unusable.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old index
You can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.
regards, tom lane
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.
Yea, the session lock on the table itself probably shouldn't be dropped. If
were holding only that one there shouldn't be any additional deadlock dangers
when dropping the index due to lock upgrades as were doing the normal dance
any DROP INDEX does. They seem pretty unlikely in a !valid !ready table
anyway.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be dropped. If
were holding only that one there shouldn't be any additional deadlock dangers
when dropping the index due to lock upgrades as were doing the normal dance
any DROP INDEX does. They seem pretty unlikely in a !valid !ready table
Just à note...
My patch drops the locks on parent table and indexes at the end of process, after dropping the old indexes ;)
Michael
Show quoted text
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be dropped.
If were holding only that one there shouldn't be any additional deadlock
dangers when dropping the index due to lock upgrades as were doing the
normal dance any DROP INDEX does. They seem pretty unlikely in a !valid
!ready tableJust à note...
My patch drops the locks on parent table and indexes at the end of process,
after dropping the old indexes ;)
I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on the
index. Thats why I think dropping the lock on the index and then reacquiring
an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3 lines...
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2012/10/04, at 5:41, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be dropped.
If were holding only that one there shouldn't be any additional deadlock
dangers when dropping the index due to lock upgrades as were doing the
normal dance any DROP INDEX does. They seem pretty unlikely in a !valid
!ready tableJust à note...
My patch drops the locks on parent table and indexes at the end of process,
after dropping the old indexes ;)I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on the
index. Thats why I think dropping the lock on the index and then reacquiring
an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3 lines...
Tom is right. This scenario does not cover the case where you drop the parent table or you drop the index, which is indeed invisible, but still has a pg_class and a pg_index entry, from a different session after step 10 and before step 11. So you cannot either drop the locks on indexes until you are done at step 12.
Show quoted text
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, October 03, 2012 11:42:25 PM Michael Paquier wrote:
On 2012/10/04, at 5:41, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be
dropped. If were holding only that one there shouldn't be any
additional deadlock dangers when dropping the index due to lock
upgrades as were doing the normal dance any DROP INDEX does. They seem
pretty unlikely in a !valid !ready tableJust à note...
My patch drops the locks on parent table and indexes at the end of
process, after dropping the old indexes ;)I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on
the index. Thats why I think dropping the lock on the index and then
reacquiring an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3
lines...Tom is right. This scenario does not cover the case where you drop the
parent table or you drop the index, which is indeed invisible, but still
has a pg_class and a pg_index entry, from a different session after step
10 and before step 11. So you cannot either drop the locks on indexes
until you are done at step 12.
Yep:
Yea, the session lock on the table itself probably shouldn't be dropped.
But that does *not* mean you cannot avoid lock upgrade issues by dropping the
lower level lock on the index first and only then acquiring the access exlusive
lock. Note that dropping an index always includes *first* getting a lock on the
table so doing it that way is safe and just the same as a normal drop index.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.
Btw, there is still something I wanted to clarify. You mention in your
ideas "old" and "new" indexes.
Such as we create a new index at the begininning and drop the old one at
the end. This is not completely true in the case of switching relfilenode.
What happens is that we create a new index with a new physical storage,
then at swap step, we switch the old storage and the new storage. Once swap
is done, the index that needs to be set as invalid and not ready is not the
old index. but the index created at the beginning of process that has now
the old relfilenode. Then the relation that is indeed dropped at the end of
process is also the index with the old relfilenode, so the index created
indeed at the beginning of process. I understand that this is playing with
the words, but I just wanted to confirm that we are on the same line.
--
Michael Paquier
http://michael.otacoo.com
Michael Paquier <michael.paquier@gmail.com> writes:
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
14) Swap new and old indexes, consisting here in switching their names.I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.
Andres is quite right that "switch by name" is out of the question ---
for the most part, the system pays no attention to index names at all.
It just gets a list of the OIDs of indexes belonging to a table and
works with that.
However, I'm pretty suspicious of the idea of switching relfilenodes as
well. You generally can't change the relfilenode of a relation (either
a table or an index) without taking an exclusive lock on it, because
changing the relfilenode *will* break any concurrent operations on the
index. And there is not anyplace in the proposed sequence where it's
okay to have exclusive lock on both indexes, at least not if the goal
is to not block concurrent updates at any time.
I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?
I'm really not convinced that we have a bulletproof solution yet,
at least not if you insist on the replacement index having the same name
as the original. How badly do we need that?
regards, tom lane
On 2012/10/04, at 10:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
14) Swap new and old indexes, consisting here in switching their names.I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.Andres is quite right that "switch by name" is out of the question ---
for the most part, the system pays no attention to index names at all.
It just gets a list of the OIDs of indexes belonging to a table and
works with that.
Sure. The switching being done by changing the index name is just the direction taken by the first version of the patch, and only that. I just wrote this version without really looking for a bulletproof solution but only for something to discuss about.
However, I'm pretty suspicious of the idea of switching relfilenodes as
well. You generally can't change the relfilenode of a relation (either
a table or an index) without taking an exclusive lock on it, because
changing the relfilenode *will* break any concurrent operations on the
index. And there is not anyplace in the proposed sequence where it's
okay to have exclusive lock on both indexes, at least not if the goal
is to not block concurrent updates at any time.
Ok. As the goal is to allow concurrent operations, this is not reliable either. So what is remaining is the method switching the OIDs of old and new indexes in pg_index? Any other candidates?
I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?I'm really not convinced that we have a bulletproof solution yet,
at least not if you insist on the replacement index having the same name as the original. How badly do we need that?
And we do not really need such a solution as I am not insisting on the method that switches indexes by changing names. I am open to a reliable and robust method, and I hope this method could be decided in this thread.
Thanks for those arguments, I am feeling it is really leading the discussion to the good direction.
Thanks.
Michael
On Thu, Oct 4, 2012 at 2:19 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?
I think this would require a new state which is the converse of
indisvalid=f. Right now there's no state the index can be in that
means the index should be ignored for both scans and maintenance but
might have old sessions that might be using it or maintaining it.
I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does. It seems flaky to have two
different mechanisms depending on which mode is being used. It seems
more conservative to use the same mechanism and just figure out what's
required to ensure it's safe in both modes. At least there won't be
any bugs from unexpected consequences that aren't locking related if
it's using the same mechanics.
--
greg
Greg Stark <stark@mit.edu> writes:
I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does.
Swapping the relfilenodes is fine *as long as you have exclusive lock*.
The trick is to make it safe without that. It will definitely not work
to do that without exclusive lock, because at the instant you would try
it, people will be accessing the new index (by OID).
regards, tom lane