Support for REINDEX CONCURRENTLY

Started by Michael Paquierover 13 years ago257 messageshackers

michael@paquier.xyz

over 13 years ago

Hi all,

One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.

The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php

Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
This basically allows to perform read and write operations on a table whose
index(es) are
reindexed at the same time. Pretty useful for a production environment. The
caveats of this
feature is that it is slower than normal reindex, and impacts other
backends with the extra CPU,
memory and IO it uses to process. The implementation is based on something
on the same ideas
as pg_reorg and on an idea of Andres.
Please find attached a version that I consider as a base for the next
discussions, perhaps
a version that could be submitted to the commit fest next month. Patch is
aligned with postgres
master at commit 09ac603.

With this feature, you can rebuild a table or an index with such commands:
REINDEX INDEX ind CONCURRENTLY;
REINDEX TABLE tab CONCURRENTLY;

The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- Shared tables cannot be reindexed concurrently
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations

Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as the
index that is rebuilt (called here old index). This new index has as name
$OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and
not ready.
2) Take session locks on old and new index(es), and the parent table to
prevent
unfortunate drops.
3) Commit and start a new transaction
4) Wait until no running transactions could have the table open with the
old list of indexes.
5) Build the new indexes. All the new indexes are marked as indisready.
6) Commit and start a new transaction
7) Wait until no running transactions could have the table open with the
old list of indexes.
8) Take a reference snapshot and validate the new indexes
9) Wait for the old snapshots based on the reference snapshot
10) mark the new indexes as indisvalid
11) Commit and start a new transaction. At this point the old and new
indexes are both valid
12) Take a new reference snapshot and wait for the old snapshots to insure
that old
indexes are not corrupted,
13) Mark the old indexes as invalid
14) Swap new and old indexes, consisting here in switching their names.
15) Old indexes are marked as invalid.
16) Commit and start a new transaction
17) Wait for transactions that might use the old indexes
18) Old indexes are marked as not ready
19) Commit and start a new transaction
20) Drop the old indexes

The following process might be reducible, but I would like that to be
decided depending on
the community feedback and experience on such concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple
waits (perhaps sometimes unnecessary?) and subtransactions.

If during the process an error occurs, the table will finish with either
the old or new index
as invalid. In this case the user will be in charge to drop the invalid
index himself.
The concurrent index can be easily identified with its suffix *_cct.

This patch has required some refactorisation effort as I noticed that the
code of index
for concurrent operations was not very generic. In order to do that, I
created some
new functions in index.c called index_concurrent_* which are used by CREATE
INDEX
and REINDEX in my patch. Some refactoring has also been done regarding the
wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes
in indexcmds.c). The patch structure is relying a maximum on the functions
of index.c
when creating, building and validating concurrent index.

Based on the comments of this thread, I would like to submit the patch at
the next
commit fest. Just let me know if the approach taken by the current
implementation
is OK ot if it needs some modifications. That would be really helpful.

The patch includes some regression tests for error checks and also some
documentation.
Regressions are passing, code has no whitespaces and no compilation
warnings.
I have also done tests checking for read and write operations on index scan
of parent table
at each step of the process (by using gdb to stop the reindex process at
precise places).

Thanks, and looking forward to your feedback,
--
Michael Paquier
http://michael.otacoo.com

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Michael Paquier (#1)

Re: Support for REINDEX CONCURRENTLY

On 3 October 2012 02:14, Michael Paquier <michael.paquier@gmail.com> wrote:

Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.

Thanks

The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.

Fair enough

- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations

Those restrictions are important ones to resolve since they prevent
the CONCURRENTLY word from being true in a large proportion of cases.

We need to be clear that the remainder of this can be done in user
space already, so the proposal doesn't move us forwards very far,
except in terms of packaging. IMHO this needs to be more than just
moving a useful script into core.

Here is a description of what happens when reorganizing an index
concurrently

There are four waits for every index, again similar to what is
possible in user space.

When we refactor that, I would like to break things down into N
discrete steps, if possible. Each time we hit a wait barrier, a
top-level process would be able to switch to another task to avoid
waiting. This would then allow us to proceed more quickly through the
task. I would admit that is a later optimisation, but it would be
useful to have the innards refactored to allow for that more easily
later. I'd accept Not yet, if doing that becomes a problem in short
term.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Michael Paquier (#1)

Re: Support for REINDEX CONCURRENTLY

Hi,

On Wednesday, October 03, 2012 03:14:17 AM Michael Paquier wrote:

One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.

The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php

Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.

Very cool!

This basically allows to perform read and write operations on a table whose
index(es) are reindexed at the same time. Pretty useful for a production
environment. The caveats of this feature is that it is slower than normal
reindex, and impacts other backends with the extra CPU, memory and IO it
uses to process. The implementation is based on something on the same ideas
as pg_reorg and on an idea of Andres.

The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.

I would like to support something like REINDEX USER TABLES; or similar at some
point, but that very well can be a second phase.

- REINDEX CONCURRENTLY cannot run inside a transaction block.

- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations

Why that restriction?

Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as
the index that is rebuilt (called here old index). This new index has as
name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
invalid and not ready.

You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
point already, to prevent schema changes.

8) Take a reference snapshot and validate the new indexes

Hm. Unless you factor in corrupt indices, why should this be needed?

14) Swap new and old indexes, consisting here in switching their names.

I think switching based on their names is not going to work very well because
indexes are referenced by oid at several places. Swapping pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.

Such a behaviour would at least be complicated for pg_depend and
pg_constraint.

The following process might be reducible, but I would like that to be
decided depending on the community feedback and experience on such
concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple waits (perhaps sometimes unnecessary?) and
subtransactions.

If during the process an error occurs, the table will finish with either
the old or new index as invalid. In this case the user will be in charge to
drop the invalid index himself.
The concurrent index can be easily identified with its suffix *_cct.

I am not really happy about relying on some arbitrary naming here. That still
can result in conflicts and such.

This patch has required some refactorisation effort as I noticed that the
code of index for concurrent operations was not very generic. In order to do
that, I created some new functions in index.c called index_concurrent_*
which are used by CREATE INDEX and REINDEX in my patch. Some refactoring has
also been done regarding the> wait processes.

REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes in indexcmds.c). The patch structure is relying a
maximum on the functions of index.c when creating, building and validating
concurrent index.

I haven't looked at the patch yet, but I was pretty sure that you would need
to do quite some refactoring to implement this and this looks like roughly the
right direction...

Thanks, and looking forward to your feedback,

I am very happy that youre taking this on!

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Support for REINDEX CONCURRENTLY

Attachments: